There are two different types of SQL that can be coded in a native SQL stored procedure in DB2: static SQL and dynamic SQL. Dynamic SQL can also be written in two different ways, thus leading to three different ways to code SQL in a stored procedure. This blog post will describe the differences between these three different approaches, highlight when one approach should be used over the other two, and give some performance metrics based on each approach.
Static SQL is almost always preferable over dynamic SQL because optimization of the SQL statements will occur only one time, at the time the stored procedure is created. An optimized access path to execute each static SQL statement is stored in the DB2 package that is associated with the stored procedure, and no additional overhead is incurred at run time.
There are two different "flavors" of dynamic SQL, and those would be (1) dynamic SQL that uses parameter markers as "place holders" for search criteria in the WHERE clauses of the SQL statements, and (2) dynamic SQL that uses hard coded literals for search criteria in the WHERE clauses of the SQL statements. When using parameter markers, the "variable" is represented by a question mark ("?") in the constructed SQL statement that is passed to the PREPARE statement, and the actual values to be used during execution are supplied via a USING clause in either the EXECUTE statement or the OPEN statement (if using a cursor).
At a high level, when doing a PREPARE of a dynamic SQL statement, the following activities take place:
1) The dynamic SQL cache is searched to determine if the SQL statement being prepared has previously been prepared and is resident in cache. DB2 will look for an EXACT MATCH on the SQL statement in the cache to determine residency. If a match is found, then DB2 will reuse the previously created optimized execution plan for that statement for execution. The dynamic cache look-up is a relatively inexpensive component of the overall execution of a dynamic SQL statement.
2) If no match is found in the dynamic cache for the SQL statement being prepared, then DB2 will have to parse the statement for syntactical accuracy, perform authorization checking, and then optimize the statement to generate an optimized execution plan for that statement. The statement and the corresponding execution plan is then stored in the dynamic cache. If the cache is full, then DB2 uses a "least recently used" algorithm to choose a victim to discard from the cache to make room for the new statement. The generation of the optimized execution plan is a relatively expensive component of the overall execution of a dynamic SQL statement.
A PREPARE that only has to go through step 1 as described above is called a short prepare. A PREPARE that has go to through both steps 1 and 2 is called a full prepare.
The performance benefits of dynamic SQL that uses parameter markers versus dynamic SQL that uses hard coded literals is achieved via the PREPARE statement. If a particular statement in a stored procedure is coded to use parameter markers, every PREPARE of that statement will be using an identical string. The first prepare of that statement will need to do a full prepare, but every subsequent prepare will only need to do a short prepare because the previously prepared statement will most likely be resident in the cache and can be reused. However, when using dynamic SQL statements that use literals, then every statement will be seen as unique, and virtually every PREPARE for that statement will result in the more expensive full prepare.
So when would somebody choose to use dynamic SQL over static SQL when coding a stored procedure? There are basically three situations in which dynamic may be preferable to static.
1) The business requirements are dictating a very flexible set of search predicates for a particular invocation of a stored procedure. For example, you may want to search a customer table on any combination of attributes, such as first name, last name, customer ID, city, state, region, etc. In such a design, it would be cumbersome to try to code a single SQL statement that can handle multiple search parameters, some of which may be supplied and some of which may not be at execution time. Even coding this as individual and separate SQL statements for each combination of parameters would be cumbersome as the number of combinations can grow exponentially as you add more parameters to the mix. This would be an ideal candidate for dynamic SQL with parameter markers as you are trading off the hit in performance for having to do a full prepare of a "unique" combination of search attributes for a flexible design.
2) Non uniformity of data distribution. The actual values of data stored in a particular table may have a non-uniform distribution of data, such that some values appear very frequently, while others appear infrequently. The DB2 optimizer may chose to perform a table scan when optimizing a query that will return a large number of rows (based on the search criteria), while an index may be chosen by the optimizer for other search values that it determines will return a small number of rows for the same query. When using static SQL, the DB2 optimizer will make an informed decision as to what would be the best access path for most queries, but for some particular values a "one size fits all" access path may actually result in poor performance for those search values that are part of the data skew. In this case, dynamic SQL with literals may be a better choice than static SQL as the DB2 optimizer can make an informed decision at prepare time for the optimal access path based on the actual values being searched on.
3) Range predicates. Very similar to non uniformity of data distribution, an SQL query that searches on a range of data may benefit from being written as dynamic SQL with literals than as a static SQL statement. Range predicates would include greater than, less than, BETWEEN, and LIKE predicates. Based on the actual values you are searching on, you may be asking DB2 to return a small amount of data (in which case an index access would be preferable) or a large amount of data (in which case a table scan may be preferable). Writing these statements as dynamic with literals may provide the most appropriate access path for a given query based on the actual values being searched on.
To gather some actual performance metrics on each of these three approaches to writing SQL in a native SQL stored procedure in DB2, I wrote three nearly identical stored procedures and ran them against a test database in our environment to get some performance benchmarks. That particular database had 2,961 unique values of a particular search column in a "master" table, and my programs accessed all 2,961 of them using an SQL statement from one of our business applications. The only difference in the three SP's were the manner in which the SQL statement in question was written: static, versus dynamic with parameter markers versus dynamic with literals. Here are the results:
Static SQL
Total elapsed time: 3.224109 seconds
User CPU time: 0.109829 seconds
System CPU time: 0.055669 seconds
Dynamic SQL with Parameter Markers
Total elapsed time: 3.554240 seconds
User CPU time: 0.130988 seconds
System CPU time: 0.058271 seconds
Dynamic SQL with Literals
Total elapsed time: 72.297577 seconds
User CPU time: 12.478055 seconds
System CPU time: 1.642622 seconds
As we can see from the above numbers, dynamic SQL with parameter markers can achieve performance that approaches (but will not meet or exceed) static SQL, while dynamic SQL coded with literals will perform significantly worse than either of the other two options.
It is important to note that dynamic with parameter markers still performs slightly worse than static SQL. This is because of the overhead incurred by the necessary PREPARE statements for dynamic SQL. The slight overhead of doing short prepares with dynamic SQL using parameter markers is still an expense that static SQL does not incur, and has a cumulative impact on performance over many executions. Plus, in a "real life" environment in which you have multiple stored procedures each executing multiple SQL statements, if the number of "unique" dynamic statements (with parameter markers) exceeds the capacity of the dynamic SQL cache, statements can and will get discarded from the cache and the reuse of the statements in the cache can degrade over time as the workload grows.
The drastic difference in performance for the dynamic SQL with literals test over the other two tests can be attributed to the cost of doing full prepares for every single statement during the execution of the SP.