The reason that I call this topic as treasure is that this feature was introduced in Oracle 8.1.6 ( a whiiiiiile back ) . Not many people are using ( to its fullest extent ) .
You would have guessed it by now . Yeah . It is analytics.
Analytics is one of the greatest additon to the SQL family . Though ,it is documented only in Datawarehousing guide . It is equally important in OLTP application .
It has ranking , reporting , statisticial functions to name a few.
As an example , let us go to scott schema .
In this post , I will cover few basic analytical SQLs . Some of the analytical constructs has been introduced in SQL Server 2005.
Rest of the analytical functions in the subsequent posts.
This is to return a running sequence number.
SQL> SELECT empno,
2 ename,
3 job,
4 sal,
5 deptno,
6 row_number() over(PARTITION BY deptno ORDER BY sal ASC) rn,
7 rank() over(PARTITION BY deptno ORDER BY sal ASC) rank,
8 dense_rank() over(PARTITION BY deptno ORDER BY sal ASC) dense_rank,
9 lag(sal) over(PARTITION BY deptno ORDER BY sal ASC) previous_sal,
10 lead(sal) over(PARTITION BY deptno ORDER BY sal ASC) next_sal
11 FROM emp
12 ORDER BY deptno, sal
13 /
EMPNO ENAME JOB SAL DEPTNO RN RANK DENSE_RANK PREVIOUS_SAL NEXT_SAL
----- ---------- --------- --------- ------ ---------- ---------- ---------- ------------ ----------
7934 MILLER CLERK 1300.00 10 1 1 1 2450
7782 CLARK MANAGER 2450.00 10 2 2 2 1300 5000
7839 KING PRESIDENT 5000.00 10 3 3 3 2450
7369 SMITH CLERK 800.00 20 1 1 1 1100
7876 ADAMS CLERK 1100.00 20 2 2 2 800 2975
7566 JONES MANAGER 2975.00 20 3 3 3 1100 3000
7788 SCOTT ANALYST 3000.00 20 4 4 4 2975 3000
7902 FORD ANALYST 3000.00 20 5 4 4 3000
7900 JAMES CLERK 950.00 30 1 1 1 1250
7654 MARTIN SALESMAN 1250.00 30 2 2 2 950 1250
7521 WARD SALESMAN 1250.00 30 3 2 2 1250 1500
7844 TURNER SALESMAN 1500.00 30 4 4 3 1250 1600
7499 ALLEN SALESMAN 1600.00 30 5 5 4 1500 2850
7698 BLAKE MANAGER 2850.00 30 6 6 5 1600
SQL> SELECT empno, ename, job, sal, deptno, ntile(3) over(ORDER BY sal) ntile
2 FROM emp
3 ORDER BY sal
4 /
EMPNO ENAME JOB SAL DEPTNO NTILE
----- ---------- --------- --------- ------ ----------
7369 SMITH CLERK 800.00 20 1
7900 JAMES CLERK 950.00 30 1
7876 ADAMS CLERK 1100.00 20 1
7521 WARD SALESMAN 1250.00 30 1
7654 MARTIN SALESMAN 1250.00 30 1
7934 MILLER CLERK 1300.00 10 2
7844 TURNER SALESMAN 1500.00 30 2
7499 ALLEN SALESMAN 1600.00 30 2
7782 CLARK MANAGER 2450.00 10 2
7698 BLAKE MANAGER 2850.00 30 2
7566 JONES MANAGER 2975.00 20 3
7788 SCOTT ANALYST 3000.00 20 3
7902 FORD ANALYST 3000.00 20 3
7839 KING PRESIDENT 5000.00 10 3
The first look at the following example may be little bit intimidating . If you look at the second example , it may be clear .
In the latter example , we get the ratio of the salaries in DEPT 10 . The total of sal is 8750 , out of which CLARK's salary is 2450 ( 28% of the department's total salary ).
It helped us in one scenario , where the requirement was to save the report into an excel spreadsheet format .
This report was developed using PowerBuilder . In PowerBuilder , we can save the contents of the datawindow into an excel spreadsheet . The developer has done in the ratio calculations in the front end ( computed column in PB lingo ) . When the data window was saved as an excel , the calculation did not make into an excel ( as only the result set of the SQL was saved ) . In this case , the following function came in handy.
SQL> SELECT empno,
2 ename,
3 job,
4 deptno,
5 sal,
6 ratio_to_report(sal) over() rr_whole,
7 ratio_to_report(sal) over(PARTITION BY deptno) rr_deptno
8 FROM emp
9 /
EMPNO ENAME JOB DEPTNO SAL RR_WHOLE RR_DEPTNO
----- ---------- --------- ------ --------- ---------- ----------
7782 CLARK MANAGER 10 2450.00 0.08440999 0.28
7839 KING PRESIDENT 10 5000.00 0.17226528 0.57142857
7934 MILLER CLERK 10 1300.00 0.04478897 0.14857142
7566 JONES MANAGER 20 2975.00 0.10249784 0.27356321
7902 FORD ANALYST 20 3000.00 0.10335917 0.27586206
7876 ADAMS CLERK 20 1100.00 0.03789836 0.10114942
7369 SMITH CLERK 20 800.00 0.02756244 0.07356321
7788 SCOTT ANALYST 20 3000.00 0.10335917 0.27586206
7521 WARD SALESMAN 30 1250.00 0.04306632 0.13297872
7844 TURNER SALESMAN 30 1500.00 0.05167958 0.15957446
7499 ALLEN SALESMAN 30 1600.00 0.05512489 0.17021276
7900 JAMES CLERK 30 950.00 0.03273040 0.10106382
7698 BLAKE MANAGER 30 2850.00 0.09819121 0.30319148
7654 MARTIN SALESMAN 30 1250.00 0.04306632 0.13297872
SQL> SELECT empno,
2 ename,
3 job,
4 deptno,
5 sal,
6 ratio_to_report(sal) over(PARTITION BY deptno) rr_deptno
7 FROM emp
8 Where deptno = 10
9 /
EMPNO ENAME JOB DEPTNO SAL RR_DEPTNO
----- ---------- --------- ------ --------- ----------
7782 CLARK MANAGER 10 2450.00 0.28
7839 KING PRESIDENT 10 5000.00 0.57142857
7934 MILLER CLERK 10 1300.00 0.14857142
You would have guessed it by now . Yeah . It is analytics.
Analytics is one of the greatest additon to the SQL family . Though ,it is documented only in Datawarehousing guide . It is equally important in OLTP application .
It has ranking , reporting , statisticial functions to name a few.
As an example , let us go to scott schema .
In this post , I will cover few basic analytical SQLs . Some of the analytical constructs has been introduced in SQL Server 2005.
Rest of the analytical functions in the subsequent posts.
Example 1:
Usuage of ROW_NUMBER function .
This is to return a running sequence number.
SQL> SELECT empno,
2 ename,
3 job,
4 sal,
5 deptno,
6 row_number() over(ORDER BY sal DESC) rn
7 FROM emp;
EMPNO ENAME JOB SAL DEPTNO RN
----- ---------- --------- --------- ------ ----------
7839 KING PRESIDENT 5000.00 10 1
7902 FORD ANALYST 3000.00 20 2
7788 SCOTT ANALYST 3000.00 20 3
7566 JONES MANAGER 2975.00 20 4
7698 BLAKE MANAGER 2850.00 30 5
7782 CLARK MANAGER 2450.00 10 6
7499 ALLEN SALESMAN 1600.00 30 7
7844 TURNER SALESMAN 1500.00 30 8
7934 MILLER CLERK 1300.00 10 9
7521 WARD SALESMAN 1250.00 30 10
7654 MARTIN SALESMAN 1250.00 30 11
7876 ADAMS CLERK 1100.00 20 12
7900 JAMES CLERK 950.00 30 13
7369 SMITH CLERK 800.00 20 14
2 ename,
3 job,
4 sal,
5 deptno,
6 row_number() over(ORDER BY sal DESC) rn
7 FROM emp;
EMPNO ENAME JOB SAL DEPTNO RN
----- ---------- --------- --------- ------ ----------
7839 KING PRESIDENT 5000.00 10 1
7902 FORD ANALYST 3000.00 20 2
7788 SCOTT ANALYST 3000.00 20 3
7566 JONES MANAGER 2975.00 20 4
7698 BLAKE MANAGER 2850.00 30 5
7782 CLARK MANAGER 2450.00 10 6
7499 ALLEN SALESMAN 1600.00 30 7
7844 TURNER SALESMAN 1500.00 30 8
7934 MILLER CLERK 1300.00 10 9
7521 WARD SALESMAN 1250.00 30 10
7654 MARTIN SALESMAN 1250.00 30 11
7876 ADAMS CLERK 1100.00 20 12
7900 JAMES CLERK 950.00 30 13
7369 SMITH CLERK 800.00 20 14
Example 2:
The following example has lot of functions .
1. LEAD and LAG will let us to look at the previous and next records's value .
2. RANK and DENSE_RANK let us to rank the particular record based on the column ( we specify ) . The difference between RANK and DENSE_RANK comes when there is a tie in the column value . DENSE_RANK does not skip the rank , where as the RANK does.
2 ename,
3 job,
4 sal,
5 deptno,
6 row_number() over(PARTITION BY deptno ORDER BY sal ASC) rn,
7 rank() over(PARTITION BY deptno ORDER BY sal ASC) rank,
8 dense_rank() over(PARTITION BY deptno ORDER BY sal ASC) dense_rank,
9 lag(sal) over(PARTITION BY deptno ORDER BY sal ASC) previous_sal,
10 lead(sal) over(PARTITION BY deptno ORDER BY sal ASC) next_sal
11 FROM emp
12 ORDER BY deptno, sal
13 /
EMPNO ENAME JOB SAL DEPTNO RN RANK DENSE_RANK PREVIOUS_SAL NEXT_SAL
----- ---------- --------- --------- ------ ---------- ---------- ---------- ------------ ----------
7934 MILLER CLERK 1300.00 10 1 1 1 2450
7782 CLARK MANAGER 2450.00 10 2 2 2 1300 5000
7839 KING PRESIDENT 5000.00 10 3 3 3 2450
7369 SMITH CLERK 800.00 20 1 1 1 1100
7876 ADAMS CLERK 1100.00 20 2 2 2 800 2975
7566 JONES MANAGER 2975.00 20 3 3 3 1100 3000
7788 SCOTT ANALYST 3000.00 20 4 4 4 2975 3000
7902 FORD ANALYST 3000.00 20 5 4 4 3000
7900 JAMES CLERK 950.00 30 1 1 1 1250
7654 MARTIN SALESMAN 1250.00 30 2 2 2 950 1250
7521 WARD SALESMAN 1250.00 30 3 2 2 1250 1500
7844 TURNER SALESMAN 1500.00 30 4 4 3 1250 1600
7499 ALLEN SALESMAN 1600.00 30 5 5 4 1500 2850
7698 BLAKE MANAGER 2850.00 30 6 6 5 1600
Example 3:
NTILE let us to divide the results in equal height .
In the following example , the result set is divided into three equal parts .
I find NTILE very useful in do it yourself (DIY) parallel-zing jobs . In DIY jobs , you could divvy up the result set and send it to parallel jobs either using DBMS_JOB / DBMS_SCHDULER.
2 FROM emp
3 ORDER BY sal
4 /
EMPNO ENAME JOB SAL DEPTNO NTILE
----- ---------- --------- --------- ------ ----------
7369 SMITH CLERK 800.00 20 1
7900 JAMES CLERK 950.00 30 1
7876 ADAMS CLERK 1100.00 20 1
7521 WARD SALESMAN 1250.00 30 1
7654 MARTIN SALESMAN 1250.00 30 1
7934 MILLER CLERK 1300.00 10 2
7844 TURNER SALESMAN 1500.00 30 2
7499 ALLEN SALESMAN 1600.00 30 2
7782 CLARK MANAGER 2450.00 10 2
7698 BLAKE MANAGER 2850.00 30 2
7566 JONES MANAGER 2975.00 20 3
7788 SCOTT ANALYST 3000.00 20 3
7902 FORD ANALYST 3000.00 20 3
7839 KING PRESIDENT 5000.00 10 3
Example 4:
The first look at the following example may be little bit intimidating . If you look at the second example , it may be clear .
In the latter example , we get the ratio of the salaries in DEPT 10 . The total of sal is 8750 , out of which CLARK's salary is 2450 ( 28% of the department's total salary ).
It helped us in one scenario , where the requirement was to save the report into an excel spreadsheet format .
This report was developed using PowerBuilder . In PowerBuilder , we can save the contents of the datawindow into an excel spreadsheet . The developer has done in the ratio calculations in the front end ( computed column in PB lingo ) . When the data window was saved as an excel , the calculation did not make into an excel ( as only the result set of the SQL was saved ) . In this case , the following function came in handy.
SQL> SELECT empno,
2 ename,
3 job,
4 deptno,
5 sal,
6 ratio_to_report(sal) over() rr_whole,
7 ratio_to_report(sal) over(PARTITION BY deptno) rr_deptno
8 FROM emp
9 /
EMPNO ENAME JOB DEPTNO SAL RR_WHOLE RR_DEPTNO
----- ---------- --------- ------ --------- ---------- ----------
7782 CLARK MANAGER 10 2450.00 0.08440999 0.28
7839 KING PRESIDENT 10 5000.00 0.17226528 0.57142857
7934 MILLER CLERK 10 1300.00 0.04478897 0.14857142
7566 JONES MANAGER 20 2975.00 0.10249784 0.27356321
7902 FORD ANALYST 20 3000.00 0.10335917 0.27586206
7876 ADAMS CLERK 20 1100.00 0.03789836 0.10114942
7369 SMITH CLERK 20 800.00 0.02756244 0.07356321
7788 SCOTT ANALYST 20 3000.00 0.10335917 0.27586206
7521 WARD SALESMAN 30 1250.00 0.04306632 0.13297872
7844 TURNER SALESMAN 30 1500.00 0.05167958 0.15957446
7499 ALLEN SALESMAN 30 1600.00 0.05512489 0.17021276
7900 JAMES CLERK 30 950.00 0.03273040 0.10106382
7698 BLAKE MANAGER 30 2850.00 0.09819121 0.30319148
7654 MARTIN SALESMAN 30 1250.00 0.04306632 0.13297872
SQL> SELECT empno,
2 ename,
3 job,
4 deptno,
5 sal,
6 ratio_to_report(sal) over(PARTITION BY deptno) rr_deptno
7 FROM emp
8 Where deptno = 10
9 /
EMPNO ENAME JOB DEPTNO SAL RR_DEPTNO
----- ---------- --------- ------ --------- ----------
7782 CLARK MANAGER 10 2450.00 0.28
7839 KING PRESIDENT 10 5000.00 0.57142857
7934 MILLER CLERK 10 1300.00 0.14857142
Very helpful, it will be great if you could add the any new enhancements in 10g or 11g
ReplyDelete