Tuesday, June 11, 2013

Enhanced OLAP Functionality

In addition to OLAP functions such as RANK, DENSE_RANK and ROW_NUMBER, which were already supported in previous releases of DB2 LUW, v9.5 introduces five additional OLAP functions: LAG, LEAD, FIRST_VALUE, LAST_VALUE and RATIO_TO_REPORT.

The LAG function returns the expression value for the row that is “x” rows before the current row in an OLAP window of data.

The LEAD function returns the expression value for the row that is “x” rows after the current row in an OLAP window of data.

The FIRST_VALUE function returns the expression value for the first row in an OLAP window of data.

The LAST_VALUE function returns the expression value for the last row in an OLAP window of data.

The RATIO_TO_REPORT function returns the ratio of an argument to the sum of the arguments in an OLAP window of data.

These new functions can best be understood through a simple example that uses three of these new functions.

Given the following table:

CREATE TABLE SALARY_TABLE
    (NAME VARCHAR(20)
    ,SALARY DECIMAL(8,2)
    );

. . . that is populated with 20 rows of data, the following query is executed:

SELECT NAME, SALARY,
      (LEAD (SALARY, 1) OVER (ORDER BY SALARY)) – SALARY AS LEAD,
       FIRST_VALUE (SALARY) OVER (ORDER BY SALARY DESC) – SALARY AS FRST_VAL,
       DECIMAL(RATIO_TO_REPORT (SALARY) OVER (), 5, 4) AS RATIO_TO_REPORT
    FROM SALARY_TABLE
    ORDER BY SALARY DESC;

The result of the query is shown below:

NAME        SALARY        LEAD     FRST_VAL    RATIO_TO_REPORT

ELLEN    200000.00           -         0.00             0.1130
PETE     175000.00    25000.00     25000.00             0.0989
FRANK    150000.00    25000.00     50000.00             0.0847
CATHY    125000.00    25000.00     75000.00             0.0706
TOM      120000.00     5000.00     80000.00             0.0678
JOE      100000.00    20000.00    100000.00             0.0565
SUE      100000.00        0.00    100000.00             0.0565
DEB       90000.00    10000.00    110000.00             0.0508
JOAN      85000.00     5000.00    115000.00             0.0480
DAVE      80000.00     5000.00    120000.00             0.0452
SALLY     80000.00        0.00    120000.00             0.0452
BILL      75000.00     5000.00    125000.00             0.0424
BOB       70000.00     5000.00    130000.00             0.0395
ANN       60000.00    10000.00    140000.00             0.0339
PAUL      50000.00    10000.00    150000.00             0.0282
KAREN     50000.00        0.00    150000.00             0.0282
MARY      50000.00        0.00    150000.00             0.0282
RAY       45000.00     5000.00    155000.00             0.0254
TRACY     40000.00     5000.00    160000.00             0.0226
JIM       25000.00    15000.00    175000.00             0.0141

First, let’s look at the LEAD function. We want to get the difference between each person’s salary, and the salary of the person directly before them in result set. Since the result set is returned in descending order of SALARY, Ellen is listed first with the highest salary and has a null value for LEAD, since nobody is listed before her. Pete is next, and his salary is 25,000.00 less than Ellen’s. Frank’s salary is 25,000.00 less than Pete’s, and so on. In the event of a two or more people with equal salaries, the first one listed has a delta salary compared to the person before him/her, and the second (and subsequent) “ties” have a LEAD value of 0.00.

The FIRST_VALUE function in this example is showing the difference between the person’s own salary and the salary of the highest paid person in the department (Ellen, with 200,000.00).

The RATIO_TO_REPORT function in this example shows the ratio (or percentage) of each person’s salary compared to the total salary of everybody listed. The total salary of the entire table is 1,770,000.00. With a salary of 200,000.00, Ellen is making 0.1130 of the total salary in the table. Jim, at the bottom with a salary of only 25,000.00, is only making 0.0141 of the total salary in the table.

OLAP functions can also be “partitioned” by another value. If our table contained a column for the DEPARTMENT that each employee worked in, the OLAP functions could be further broken down such that the FIRST_VALUE and RATIO_TO_REPORT functions would be relative to each employee’s standing within their department rather than the entire table as a whole.