ZAHIR MOHIDEEN: Analytical functions

Showing posts with label Analytical functions. Show all posts

Tuesday, August 3, 2021

Tableau LOD Calculations Explained with Ranking / Window functions

I started my learning journey with Tableau few months ago.

So far , it is pretty good. However , I found LOD ( level of detail) expressions are interestingly complex

at the first encounter. Reading thru few examples and watching few videos to get a better understanding of these concepts.

Being I come from database background and SQL knowledge , I am hoping , I can provide my understanding of LOD thru ranking/ analytical

functions.

Please see my blog posts on ranking / analytical functions here

https://mfzahirdba.blogspot.com/2009/10/analytics-20.html

https://mfzahirdba.blogspot.com/2009/11/analytics-20-part-ii.html

https://mfzahirdba.blogspot.com/2018/10/windowing-clause-why-it-matters.html

https://mfzahirdba.blogspot.com/2013/08/oracles-analytical-functions-vs-sql.html

https://mfzahirdba.blogspot.com/2012/09/difference-between-percentilecont-and.html

There are three LOD keywords ( FIXED , INCLUDE , EXCLUDE) in Tableau.

FIXED LOD calculations does not depend on the visuals ( aka viz) , where as visulation impacts

how INCLUDE and EXCLUDE are calculated.

In my example , I have created a table in SQL server and populated few records.

To follow along the example , feel free to use the script below.

Create table CourseEnrollment

( StudentName varchar(50) ,

Department varchar(50) ,

Subject varchar(50) ,

EnrollmentDate Date

) ;

Insert into CourseEnrollment values ( 'Zahir Mohideen' , 'Maths' , 'Pre Calculus' , '1988-03-01') ;

Insert into CourseEnrollment values ( 'Zahir Mohideen' , 'Maths' , 'Multivariate Calculus' , '2010-05-16') ;

Insert into CourseEnrollment values ( 'Zahir Mohideen' , 'Statistics' , 'Statistics 101' , '2020-07-25') ;

Insert into CourseEnrollment values ( 'Abdul Hameed' , 'Maths' , 'Pre Calculus' , '1991-03-01') ;

Insert into CourseEnrollment values ( 'Abdul Hameed' , 'Physics' , 'Phy101' , '2020-07-25') ;

I have connected Tableau to my underlying table .

Here , I want to know the very first enrollment of the student no matter what department / course he enrolled into . Also , this value does not change , when the viz is changed.

In the "sheet" tab , I create a calculated field for the FIXED LOD , by clicking on "Create Calculated Field " as below.

Similarly , create LOD calculations for INCLUDE and EXCLUDE.

Let us bring in these LOD calculations into the sheet .

The value for First Enrollment Date - Fixed is the minimum enrollment date per student , no matter what the dimensions are available in the viz. FIXED by Student Name .

The value for First Enrollment Date - INCLUDE is the minimum enrollment date per student and all other available dimension in the viz . In our case , Department . So , in essence , first enrollment date per student / department.

The value for First Enrollment Date - EXCLUDE is the minimum enrollment date per student and all other available dimension in the viz , except student name as specified in the LOD . In our case , Department . So , in essence , first enrollment date per department.

These results can be derived from SQL using analytical / window functions as

When we remove the department / Student , the FIXED LOD remains the same ; there is a change in Include and Exclude.

Hope , this simple example provides the LOD in the eyes of SQL developer.

I offer support for database maintenance , performance tuning , database development. Please contact for any database related activities.

Monday, June 16, 2014

SQL Treasures.

The reason that I call this topic as treasure is that this feature was introduced in Oracle 8.1.6 ( a whiiiiiile back ) . Not many people are using ( to its fullest extent ) .

You would have guessed it by now . Yeah . It is analytics.

Analytics is one of the greatest additon to the SQL family . Though ,it is documented only in Datawarehousing guide . It is equally important in OLTP application .

It has ranking , reporting , statisticial functions to name a few.

As an example , let us go to scott schema .

In this post , I will cover few basic analytical SQLs . Some of the analytical constructs has been introduced in SQL Server 2005.

Rest of the analytical functions in the subsequent posts.

Example 1:

Usuage of ROW_NUMBER function .

This is to return a running sequence number.

SQL> SELECT empno,

  2         ename,

  3         job,

  4         sal,

  5         deptno,

  6         row_number() over(ORDER BY sal DESC) rn

  7  FROM emp;

EMPNO ENAME      JOB             SAL DEPTNO         RN

----- ---------- --------- --------- ------ ----------

 7839 KING       PRESIDENT   5000.00     10          1

 7902 FORD       ANALYST     3000.00     20          2

 7788 SCOTT      ANALYST     3000.00     20          3

 7566 JONES      MANAGER     2975.00     20          4

 7698 BLAKE      MANAGER     2850.00     30          5

 7782 CLARK      MANAGER     2450.00     10          6

 7499 ALLEN      SALESMAN    1600.00     30          7

 7844 TURNER     SALESMAN    1500.00     30          8

 7934 MILLER     CLERK       1300.00     10          9

 7521 WARD       SALESMAN    1250.00     30         10

 7654 MARTIN     SALESMAN    1250.00     30         11

 7876 ADAMS      CLERK       1100.00     20         12

 7900 JAMES      CLERK        950.00     30         13

 7369 SMITH      CLERK        800.00     20         14

Example 2:

The following example has lot of functions .

1. LEAD and LAG will let us to look at the previous and next records's value .

2. RANK and DENSE_RANK let us to rank the particular record based on the column ( we specify ) . The difference between RANK and DENSE_RANK comes when there is a tie in the column value . DENSE_RANK does not skip the rank , where as the RANK does.

SQL> SELECT empno,
2         ename,
3         job,
4         sal,
5         deptno,
6         row_number() over(PARTITION BY deptno ORDER BY sal ASC) rn,
7         rank() over(PARTITION BY deptno ORDER BY sal ASC) rank,
8         dense_rank() over(PARTITION BY deptno ORDER BY sal ASC) dense_rank,
9         lag(sal) over(PARTITION BY deptno ORDER BY sal ASC) previous_sal,
10         lead(sal) over(PARTITION BY deptno ORDER BY sal ASC) next_sal
11 FROM emp
12 ORDER BY deptno, sal
13 /

EMPNO ENAME      JOB             SAL DEPTNO         RN       RANK DENSE_RANK PREVIOUS_SAL   NEXT_SAL
----- ---------- --------- --------- ------ ---------- ---------- ---------- ------------ ----------
7934 MILLER     CLERK       1300.00     10          1          1          1                    2450
7782 CLARK      MANAGER     2450.00     10          2          2          2         1300       5000
7839 KING       PRESIDENT   5000.00     10          3          3          3         2450
7369 SMITH      CLERK        800.00     20          1          1          1                    1100
7876 ADAMS      CLERK       1100.00     20          2          2          2          800       2975
7566 JONES      MANAGER     2975.00     20          3          3          3         1100       3000
7788 SCOTT      ANALYST     3000.00     20          4          4          4         2975       3000
7902 FORD       ANALYST     3000.00     20          5          4          4         3000
7900 JAMES      CLERK        950.00     30          1          1          1                    1250
7654 MARTIN     SALESMAN    1250.00     30          2          2          2          950       1250
7521 WARD       SALESMAN    1250.00     30          3          2          2         1250       1500
7844 TURNER     SALESMAN    1500.00     30          4          4          3         1250       1600
7499 ALLEN      SALESMAN    1600.00     30          5          5          4         1500       2850
7698 BLAKE      MANAGER     2850.00     30          6          6          5         1600

Example 3:

NTILE let us to divide the results in equal height .

In the following example , the result set is divided into three equal parts .

I find NTILE very useful in do it yourself (DIY) parallel-zing jobs . In DIY jobs , you could divvy up the result set and send it to parallel jobs either using DBMS_JOB / DBMS_SCHDULER.

SQL> SELECT empno, ename, job, sal, deptno, ntile(3) over(ORDER BY sal) ntile
2 FROM emp
3 ORDER BY sal
4 /

EMPNO ENAME      JOB             SAL DEPTNO      NTILE
----- ---------- --------- --------- ------ ----------
7369 SMITH      CLERK        800.00     20          1
7900 JAMES      CLERK        950.00     30          1
7876 ADAMS      CLERK       1100.00     20          1
7521 WARD       SALESMAN    1250.00     30          1
7654 MARTIN     SALESMAN    1250.00     30          1
7934 MILLER     CLERK       1300.00     10          2
7844 TURNER     SALESMAN    1500.00     30          2
7499 ALLEN      SALESMAN    1600.00     30          2
7782 CLARK      MANAGER     2450.00     10          2
7698 BLAKE      MANAGER     2850.00     30          2
7566 JONES      MANAGER     2975.00     20          3
7788 SCOTT      ANALYST     3000.00     20          3
7902 FORD       ANALYST     3000.00     20          3
7839 KING       PRESIDENT   5000.00     10          3

Example 4:

The first look at the following example may be little bit intimidating . If you look at the second example , it may be clear .

In the latter example , we get the ratio of the salaries in DEPT 10 . The total of sal is 8750 , out of which CLARK's salary is 2450 ( 28% of the department's total salary ).

It helped us in one scenario , where the requirement was to save the report into an excel spreadsheet format .
This report was developed using PowerBuilder . In PowerBuilder , we can save the contents of the datawindow into an excel spreadsheet . The developer has done in the ratio calculations in the front end ( computed column in PB lingo ) . When the data window was saved as an excel , the calculation did not make into an excel ( as only the result set of the SQL was saved ) . In this case , the following function came in handy.

SQL> SELECT empno,
2         ename,
3         job,
4         deptno,
5         sal,
6         ratio_to_report(sal) over() rr_whole,
7         ratio_to_report(sal) over(PARTITION BY deptno) rr_deptno
8 FROM emp
9 /

EMPNO ENAME      JOB       DEPTNO       SAL   RR_WHOLE RR_DEPTNO
----- ---------- --------- ------ --------- ---------- ----------
7782 CLARK      MANAGER       10   2450.00 0.08440999       0.28
7839 KING       PRESIDENT     10   5000.00 0.17226528 0.57142857
7934 MILLER     CLERK         10   1300.00 0.04478897 0.14857142
7566 JONES      MANAGER       20   2975.00 0.10249784 0.27356321
7902 FORD       ANALYST       20   3000.00 0.10335917 0.27586206
7876 ADAMS      CLERK         20   1100.00 0.03789836 0.10114942
7369 SMITH      CLERK         20    800.00 0.02756244 0.07356321
7788 SCOTT      ANALYST       20   3000.00 0.10335917 0.27586206
7521 WARD       SALESMAN      30   1250.00 0.04306632 0.13297872
7844 TURNER     SALESMAN      30   1500.00 0.05167958 0.15957446
7499 ALLEN      SALESMAN      30   1600.00 0.05512489 0.17021276
7900 JAMES      CLERK         30    950.00 0.03273040 0.10106382
7698 BLAKE      MANAGER       30   2850.00 0.09819121 0.30319148
7654 MARTIN     SALESMAN      30   1250.00 0.04306632 0.13297872

SQL> SELECT empno,
2         ename,
3         job,
4         deptno,
5         sal,
6         ratio_to_report(sal) over(PARTITION BY deptno) rr_deptno
7 FROM emp
8 Where deptno = 10
9 /

EMPNO ENAME      JOB       DEPTNO       SAL RR_DEPTNO
----- ---------- --------- ------ --------- ----------
7782 CLARK      MANAGER       10   2450.00       0.28
7839 KING       PRESIDENT     10   5000.00 0.57142857
7934 MILLER     CLERK         10   1300.00 0.14857142

Oracle's Analytical functions Vs SQL Server's Windowing Functions

With the release of SQL Server 2012 , SQL Server has caught up with Oracle's implementation of Analytical functions , there are few subtle differences .

We cannot use a constant in the windowing clause . The workaround is to do declare/ select the constant in the Windowing function.

Here is an example.

ORACLE
======

SQL> drop table emp;
Table dropped.

SQL> Create table emp
2 ( eid int primary key,
3    elname char(10 ) not null,
4    efname char(10 ) not null,
5    deptname char(10)
6 ) ;
Table created.

SQL> insert into emp values ( 100 , 'Abdul' , 'Hameed' , 'Sales');
1 row created.

SQL> insert into emp values ( 200 , 'Syed' , 'Ishak' , 'Sales');
1 row created.

SQL> insert into emp values ( 300 , 'Zahir' , 'Mohideen' , 'IT');
1 row created.

SQL> Select eid , elname , efname , ROW_NUMBER() over ( partition by deptname order by 1 ) rn from emp;

       EID ELNAME     EFNAME             RN
---------- ---------- ---------- ----------
       300 Zahir      Mohideen            1
       100 Abdul      Hameed              1
       200 Syed       Ishak               2

SQL Server
==========

C:\>sqlcmd -W
1> use test2012nf
2> go
Changed database context to 'test2012nf'.
1> drop table emp;
2> GO
1> Create table emp
2> ( eid int primary key,
3>   elname char(10 ) not null,
4>   efname char(10 ) not null,
5>   deptname char(10)
6> ) ;
7> GO
1> insert into emp values ( 100 , 'Abdul' , 'Hameed' , 'Sales');
2> insert into emp values ( 200 , 'Syed' , 'Ishak' , 'Sales');
3> insert into emp values ( 300 , 'Zahir' , 'Mohideen' , 'IT');
4> GO

(1 rows affected)

(1 rows affected)

(1 rows affected)
1> Select eid , elname , efname , ROW_NUMBER() over ( partition by deptname order by 1 ) rn from emp;
2> go
Msg 5308, Level 16, State 1, Server ITSQL-012, Line 1
Windowed functions and NEXT VALUE FOR functions do not support integer indices as ORDER BY clause expressions.

Now replace

1> Select eid , elname , efname , ROW_NUMBER() over ( partition by deptname order by (select 1) ) rn from emp;
2> go
eid elname efname rn
--- ------ ------ --
300 Zahir      Mohideen   1
100 Abdul      Hameed     1
200 Syed       Ishak      2

(3 rows affected)

Comments Welcome!

Friday, August 16, 2013

FETCH X ROWS in Oracle 12c ( New Feature !)

Oracle 12c has introduced "FETCH" n rows clause in the SELECT statement. Though , it is a "new" feature , Oracle seems to be doing the same good old techniques what the developers were doing for the last decade or so using Analytical functions.

Let us look at the example below.

The highlighted one in the following select statement is one of 12c feature. As it can be seen from the predicate information in the execution plan , Oracle is using ROW_NUMBER function. If you look at the execution plans' predicate information and results , it is pretty much the same.

Have to admit , the new feature shortens the SELECT statement and avoids the subquery .

SQL> select * from scott.emp order by empno fetch first 4 rows only;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
      7566 JONES      MANAGER         7839 02-APR-81       2975                    20

Execution Plan
----------------------------------------------------------
Plan hash value: 2801941731

----------------------------------------------------------------------------------------
| Id | Operation                     | Name   | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |        |    14 | 1582 |     2   (0)| 00:00:01 |
|* 1 | VIEW                         |        |    14 | 1582 |     2   (0)| 00:00:01 |
|* 2 |   WINDOW NOSORT STOPKEY       |        |    14 |   532 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP    |    14 |   532 |     2   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | PK_EMP |    14 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=4)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."EMPNO")<=4)

Here is the equivalent SQL from pre-12c .

SQL> Select * from ( Select e.* , row_number() over ( partition by 1 order by empno) rn from scott.emp e ) where rn <= 4 ;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO         RN
---------- ---------- --------- ---------- --------- ---------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20          1
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30          2
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30          3
      7566 JONES      MANAGER         7839 02-APR-81       2975                    20          4

Execution Plan
----------------------------------------------------------
Plan hash value: 2801941731

----------------------------------------------------------------------------------------
| Id | Operation                     | Name   | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |        |    14 | 1400 |     2   (0)| 00:00:01 |
|* 1 | VIEW                         |        |    14 | 1400 |     2   (0)| 00:00:01 |
|* 2 |   WINDOW NOSORT STOPKEY       |        |    14 |   532 |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP    |    14 |   532 |     2   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | PK_EMP |    14 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN"<=4)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMPNO")<=4)

Comments Welcome.

Wednesday, September 19, 2012

Difference between PERCENTILE_CONT and PERCENTILE_DISC

Treading on the previous post on MEDIAN , I would like to point out the Difference between PERCENTILE_CONT and PERCENTILE_DISC( MEDIAN being special case of PERCENTILE_CONT) .

There are subtle differences between PERCENTILE_CONT and PERCENTILE_DISC. When the count of record set is odd , there are no differences between them . The calculation of MEDIAN is identical to PERCENTILE_CONT when computed with percentile value of 0.5

Let us look at the first example.

SQL> DROP
2 TABLE t purge;

Table dropped.

SQL> CREATE
2 TABLE t AS
3 SELECT
4 'TEST' item ,
5 'E' AS region ,
6 level AS wk ,
7 ROUND(dbms_random.value(1,500)) forecastqty
8 FROM
9 dual
10 CONNECT BY level <= 3 ;

Table created.

SQL> SELECT
2 t.* ,
3 PERCENTILE_CONT(0.5)
4 WITHIN GROUP ( ORDER BY forecastqty)
5 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_CONT ,
6 MEDIAN(forecastqty)
7 OVER (PARTITION BY ITEM , region ) AS MEDIAN ,
8 PERCENTILE_DISC(0.5)
9 WITHIN GROUP ( ORDER BY forecastqty)
10 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_DISC
11 FROM
12 t ;

ITEM REGION WK FORECASTQTY PERCENTILE_CONT MEDIAN PERCENTILE_DISC
---- ---------- ---------- ----------- --------------- ---------- ---------------
TEST E 2 10 41 41 41
TEST E 1 41 41 41 41
TEST E 3 326 41 41 41

As you can see there are no difference between the values of PERCENTILE_CONT , PERCENTILE_DISC and MEDIAN. The computation involves the ordering of the result set ( 10 , 41 , 326 ) and picking up the middle value ( 41 ) .

Let us at look at the second example.

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> DROP
2 TABLE t purge;

Table dropped.

SQL> CREATE
2 TABLE t AS
3 SELECT
4 'TEST' item ,
5 'E' AS region ,
6 level AS wk ,
7 ROUND(dbms_random.value(1,500)) forecastqty
8 FROM
9 dual
10 CONNECT BY level <= 4 ;

Table created.

SQL> column region format a10
SQL> set linesize 800
SQL> select * from t order by forecastqty;

ITEM REGION WK FORECASTQTY
---- ---------- ---------- -----------
TEST E 3 137
TEST E 2 190
TEST E 1 232
TEST E 4 400

SQL> SELECT
2 t.* ,
3 PERCENTILE_CONT(0.5)
4 WITHIN GROUP ( ORDER BY forecastqty)
5 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_CONT ,
6 MEDIAN(forecastqty)
7 OVER (PARTITION BY ITEM , region ) AS MEDIAN ,
8 PERCENTILE_DISC(0.5)
9 WITHIN GROUP ( ORDER BY forecastqty)
10 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_DISC
11 FROM
12 t ;

ITEM REGION WK FORECASTQTY PERCENTILE_CONT MEDIAN PERCENTILE_DISC
---- ---------- ---------- ----------- --------------- ---------- ---------------
TEST E 3 137 211 211 190
TEST E 2 190 211 211 190
TEST E 1 232 211 211 190
TEST E 4 400 211 211 190

There are differences between the values of PERCENTILE_CONT , PERCENTILE_DISC .

The computation of the PERCENTILE_CONT and MEDIAN involves the ordering of the result set ( 137 , 190 , 232 , 400 ) , aggregates the middle values ( 190 , 232 ) and then makes the mean / average of the aggregate [ ( 190 + 232 ) / 2 ] .

The computation of the PERCENTILE_DISC involves the ordering of the result set ( 137 , 190 , 232 , 400 ) , pickups the first of the middle values ( 190 , in this case ) . In other words , it picks up the value where cume_dist is lowest of the middle values.

PERCENTILE_CONT and MEDIAN is not guaranteed to return one of the value in the distribution . This may not be acceptable for certain business process . In that case , we would use PERCENTILE_DISC.

Note :
SQL Server 2012 introduced these functions as well.

Comments welcome.

Tuesday, March 16, 2010

Spreadsheet like calcing (Using MODEL clause)

Oracle 10g introduced MODEL clause to simulate spreadsheet like calculations in SQL . It is complex but yet powerful . There us a chapter on this clause in the dataware housing guide .

I really wish this ( along with analytics) should be moved to Application Developers Guide

Here is an example ... In the following example , we create a new column called new_sal , and give all the employees 50% rise ( .. in this economy , this will be a jackpot... )

SQL>  SELECT empno,ename , deptno, sal , new_sal

  2   FROM scott.emp

  3   MODEL

  4   PARTITION BY (empno , ename , sal)

  5   DIMENSION BY (deptno)

  6   MEASURES (sal new_sal) IGNORE NAV

  7   RULES ( new_sal[ANY] = CV(sal) * 1.5)

  8   ORDER BY deptno;

     EMPNO ENAME          DEPTNO        SAL    NEW_SAL
---------- ---------- ---------- ---------- ----------
      7782 CLARK              10       2450       3675
      7934 MILLER             10       1300       1950
      7839 KING               10       5000       7500
      7902 FORD               20       3000       4500
      7566 JONES              20       2975     4462.5
      7788 SCOTT              20       3000       4500
      7876 ADAMS              20       1100       1650
      7369 SMITH              20        800       1200
      7844 TURNER             30       1500       2250
      7900 JAMES              30        950       1425
      7698 BLAKE              30       2850       4275
      7521 WARD               30       1250       1875
      7499 ALLEN              30       1600       2400
      7654 MARTIN             30       1250       1875

14 rows selected.

Now , I can create additional records ( to display... it is not saved into database yet ) by the following SQL . Here , I am assigning all the employees to dept 10 ( in addition to their departments).

For example , you can see the employee highlighted in both depts ( 20 and 10)

1   SELECT empno,ename , deptno, sal , new_sal
2   FROM scott.emp
3   MODEL
4   PARTITION BY (empno , ename , sal)
5   DIMENSION BY (deptno)
6   MEASURES (sal new_sal) IGNORE NAV
7   RULES ( new_sal[10] = CV(sal) * 1.5)
8* ORDER BY deptno
SQL> /

     EMPNO ENAME          DEPTNO        SAL    NEW_SAL
---------- ---------- ---------- ---------- ----------
      7934 MILLER             10       1300       1950
      7902 FORD               10       3000       4500
      7844 TURNER             10       1500       2250
      7788 SCOTT              10       3000       4500
      7900 JAMES              10        950       1425
      7698 BLAKE              10       2850       4275
      7369 SMITH              10        800       1200
      7654 MARTIN             10       1250       1875
      7876 ADAMS              10       1100       1650
      7782 CLARK              10       2450       3675
      7521 WARD               10       1250       1875
      7839 KING               10       5000       7500
      7499 ALLEN              10       1600       2400
      7566 JONES              10       2975     4462.5
      7902 FORD               20       3000       3000
      7369 SMITH              20        800        800
      7788 SCOTT              20       3000       3000
      7566 JONES              20       2975       2975
      7876 ADAMS              20       1100       1100
      7521 WARD               30       1250       1250
      7499 ALLEN              30       1600       1600
      7900 JAMES              30        950        950
      7844 TURNER             30       1500       1500
      7654 MARTIN             30       1250       1250
      7698 BLAKE              30       2850       2850

25 rows selected.

If I am interested only the records that were impacted by this rule , I can run the following SQL.

1   SELECT empno,ename , deptno, sal , new_sal
2   FROM scott.emp
3   MODEL return updated rows
4   PARTITION BY (empno , ename , sal)
5   DIMENSION BY (deptno)
6   MEASURES (sal new_sal) IGNORE NAV
7   RULES ( new_sal[10] = CV(sal) * 1.5)
8* ORDER BY deptno
SQL> /

     EMPNO ENAME          DEPTNO        SAL    NEW_SAL
---------- ---------- ---------- ---------- ----------
      7934 MILLER             10       1300       1950
      7499 ALLEN              10       1600       2400
      7566 JONES              10       2975     4462.5
      7521 WARD               10       1250       1875
      7782 CLARK              10       2450       3675
      7876 ADAMS              10       1100       1650
      7839 KING               10       5000       7500
      7369 SMITH              10        800       1200
      7698 BLAKE              10       2850       4275
      7900 JAMES              10        950       1425
      7788 SCOTT              10       3000       4500
      7844 TURNER             10       1500       2250
      7902 FORD               10       3000       4500
      7654 MARTIN             10       1250       1875

14 rows selected.

You can go for iterations using MODEL clause.

If I have a initial value of 500 and if reduce that value by 50% in each iteration , then at the 9th iteration , the values goes below 1.

SQL> SELECT value_below_breakeven , num_iterations, breakeven_value FROM dual
2   MODEL
3   DIMENSION BY (1 breakeven_value)
4   MEASURES (500 value_below_breakeven, 0 num_iterations)
5   RULES ITERATE (20) UNTIL (value_below_breakeven[1]<=1)
6 (value_below_breakeven[1] = value_below_breakeven[1]/2,
7   num_iterations[1] = num_iterations[1] + 1);

VALUE_BELOW_BREAKEVEN NUM_ITERATIONS BREAKEVEN_VALUE
--------------------- -------------- ---------------
             .9765625              9               1

Here are the iteration and its value.

Iteration	Value
0	500
1	250
2	125
3	62.5
4	31.25
5	15.625
6	7.8125
7	3.90625
8	1.953125
9	0.976563

Good luck with the MODEL clause.

Wednesday, November 4, 2009

Analytics 2.0 - Part II

Yet another neat addition to the family of Analytics in 11g Release 2.
With this new function , I can look up the value of nth row .

The following gives the 4th salary from the dept 30 ( look at the column nv) .

1 SELECT empno,
2 ename,
3 deptno,
4 sal,
5 first_value(sal) over(PARTITION BY deptno ORDER BY empno) fv,
6 nth_value((sal), 4) over(PARTITION BY deptno ORDER BY empno) nv,
7 last_value(sal) over(PARTITION BY deptno ORDER BY empno
8 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lv
9 FROM emp
10* WHERE deptno = 30

SQL> /

EMPNO      ENAME      DEPTNO     SAL        FV        NV          LV
---------- ---------- ---------- ---------- ---------- ---------- ----------
7499       ALLEN       30        1600        1600                  950
7521       WARD        30        1250        1600                  950
7654       MARTIN      30        1250        1600                  950
7698       BLAKE       30        2850        1600        2850      950
7844       TURNER      30        1500        1600        2850      950
7900       JAMES       30         950        1600        2850      950

6 rows selected.

Good luck ...