Treading on the previous post on MEDIAN , I would like to point out the Difference between PERCENTILE_CONT and PERCENTILE_DISC( MEDIAN being special case of PERCENTILE_CONT) .
There are subtle differences between PERCENTILE_CONT and PERCENTILE_DISC. When the count of record set is odd , there are no differences between them . The calculation of MEDIAN is identical to PERCENTILE_CONT when computed with percentile value of 0.5
Let us look at the first example.
SQL> DROP
2 TABLE t purge;
Table dropped.
SQL> CREATE
2 TABLE t AS
3 SELECT
4 'TEST' item ,
5 'E' AS region ,
6 level AS wk ,
7 ROUND(dbms_random.value(1,500)) forecastqty
8 FROM
9 dual
10 CONNECT BY level <= 3 ;
Table created.
SQL> SELECT
2 t.* ,
3 PERCENTILE_CONT(0.5)
4 WITHIN GROUP ( ORDER BY forecastqty)
5 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_CONT ,
6 MEDIAN(forecastqty)
7 OVER (PARTITION BY ITEM , region ) AS MEDIAN ,
8 PERCENTILE_DISC(0.5)
9 WITHIN GROUP ( ORDER BY forecastqty)
10 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_DISC
11 FROM
12 t ;
ITEM REGION WK FORECASTQTY PERCENTILE_CONT MEDIAN PERCENTILE_DISC
---- ---------- ---------- ----------- --------------- ---------- ---------------
TEST E 2 10 41 41 41
TEST E 1 41 41 41 41
TEST E 3 326 41 41 41
As you can see there are no difference between the values of PERCENTILE_CONT , PERCENTILE_DISC and MEDIAN. The computation involves the ordering of the result set ( 10 , 41 , 326 ) and picking up the middle value ( 41 ) .
Let us at look at the second example.
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> DROP
2 TABLE t purge;
Table dropped.
SQL> CREATE
2 TABLE t AS
3 SELECT
4 'TEST' item ,
5 'E' AS region ,
6 level AS wk ,
7 ROUND(dbms_random.value(1,500)) forecastqty
8 FROM
9 dual
10 CONNECT BY level <= 4 ;
Table created.
SQL> column region format a10
SQL> set linesize 800
SQL> select * from t order by forecastqty;
ITEM REGION WK FORECASTQTY
---- ---------- ---------- -----------
TEST E 3 137
TEST E 2 190
TEST E 1 232
TEST E 4 400
SQL> SELECT
2 t.* ,
3 PERCENTILE_CONT(0.5)
4 WITHIN GROUP ( ORDER BY forecastqty)
5 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_CONT ,
6 MEDIAN(forecastqty)
7 OVER (PARTITION BY ITEM , region ) AS MEDIAN ,
8 PERCENTILE_DISC(0.5)
9 WITHIN GROUP ( ORDER BY forecastqty)
10 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_DISC
11 FROM
12 t ;
ITEM REGION WK FORECASTQTY PERCENTILE_CONT MEDIAN PERCENTILE_DISC
---- ---------- ---------- ----------- --------------- ---------- ---------------
TEST E 3 137 211 211 190
TEST E 2 190 211 211 190
TEST E 1 232 211 211 190
TEST E 4 400 211 211 190
PERCENTILE_CONT and MEDIAN is not guaranteed to return one of the value in the distribution . This may not be acceptable for certain business process . In that case , we would use PERCENTILE_DISC.
Note :
SQL Server 2012 introduced these functions as well.
Comments welcome.
There are subtle differences between PERCENTILE_CONT and PERCENTILE_DISC. When the count of record set is odd , there are no differences between them . The calculation of MEDIAN is identical to PERCENTILE_CONT when computed with percentile value of 0.5
Let us look at the first example.
SQL> DROP
2 TABLE t purge;
Table dropped.
SQL> CREATE
2 TABLE t AS
3 SELECT
4 'TEST' item ,
5 'E' AS region ,
6 level AS wk ,
7 ROUND(dbms_random.value(1,500)) forecastqty
8 FROM
9 dual
10 CONNECT BY level <= 3 ;
Table created.
SQL> SELECT
2 t.* ,
3 PERCENTILE_CONT(0.5)
4 WITHIN GROUP ( ORDER BY forecastqty)
5 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_CONT ,
6 MEDIAN(forecastqty)
7 OVER (PARTITION BY ITEM , region ) AS MEDIAN ,
8 PERCENTILE_DISC(0.5)
9 WITHIN GROUP ( ORDER BY forecastqty)
10 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_DISC
11 FROM
12 t ;
ITEM REGION WK FORECASTQTY PERCENTILE_CONT MEDIAN PERCENTILE_DISC
---- ---------- ---------- ----------- --------------- ---------- ---------------
TEST E 2 10 41 41 41
TEST E 1 41 41 41 41
TEST E 3 326 41 41 41
As you can see there are no difference between the values of PERCENTILE_CONT , PERCENTILE_DISC and MEDIAN. The computation involves the ordering of the result set ( 10 , 41 , 326 ) and picking up the middle value ( 41 ) .
Let us at look at the second example.
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> DROP
2 TABLE t purge;
Table dropped.
SQL> CREATE
2 TABLE t AS
3 SELECT
4 'TEST' item ,
5 'E' AS region ,
6 level AS wk ,
7 ROUND(dbms_random.value(1,500)) forecastqty
8 FROM
9 dual
10 CONNECT BY level <= 4 ;
Table created.
SQL> column region format a10
SQL> set linesize 800
SQL> select * from t order by forecastqty;
ITEM REGION WK FORECASTQTY
---- ---------- ---------- -----------
TEST E 3 137
TEST E 2 190
TEST E 1 232
TEST E 4 400
SQL> SELECT
2 t.* ,
3 PERCENTILE_CONT(0.5)
4 WITHIN GROUP ( ORDER BY forecastqty)
5 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_CONT ,
6 MEDIAN(forecastqty)
7 OVER (PARTITION BY ITEM , region ) AS MEDIAN ,
8 PERCENTILE_DISC(0.5)
9 WITHIN GROUP ( ORDER BY forecastqty)
10 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_DISC
11 FROM
12 t ;
ITEM REGION WK FORECASTQTY PERCENTILE_CONT MEDIAN PERCENTILE_DISC
---- ---------- ---------- ----------- --------------- ---------- ---------------
TEST E 3 137 211 211 190
TEST E 2 190 211 211 190
TEST E 1 232 211 211 190
TEST E 4 400 211 211 190
There are differences between the values of PERCENTILE_CONT , PERCENTILE_DISC .
The computation of the
PERCENTILE_CONT and MEDIAN involves the ordering of the result set ( 137 , 190 , 232 , 400 ) , aggregates the middle values ( 190 , 232 ) and then makes the mean / average of the aggregate [ ( 190 + 232 ) / 2 ] .
The computation of the
PERCENTILE_DISC involves the ordering of the result set ( 137 , 190 , 232 , 400 ) , pickups the first of the middle values ( 190 , in this case ) . In other words , it picks up the value where cume_dist is lowest of the middle values.
Note :
SQL Server 2012 introduced these functions as well.
Comments welcome.