ZAHIR MOHIDEEN: September 2012

Wednesday, September 19, 2012

Difference between PERCENTILE_CONT and PERCENTILE_DISC

Treading on the previous post on MEDIAN , I would like to point out the Difference between PERCENTILE_CONT and PERCENTILE_DISC( MEDIAN being special case of PERCENTILE_CONT) .

There are subtle differences between PERCENTILE_CONT and PERCENTILE_DISC. When the count of record set is odd , there are no differences between them . The calculation of MEDIAN is identical to PERCENTILE_CONT when computed with percentile value of 0.5

Let us look at the first example.

SQL> DROP
2 TABLE t purge;

Table dropped.

SQL> CREATE
2 TABLE t AS
3 SELECT
4 'TEST' item ,
5 'E' AS region ,
6 level AS wk ,
7 ROUND(dbms_random.value(1,500)) forecastqty
8 FROM
9 dual
10 CONNECT BY level <= 3 ;

Table created.

SQL> SELECT
2 t.* ,
3 PERCENTILE_CONT(0.5)
4 WITHIN GROUP ( ORDER BY forecastqty)
5 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_CONT ,
6 MEDIAN(forecastqty)
7 OVER (PARTITION BY ITEM , region ) AS MEDIAN ,
8 PERCENTILE_DISC(0.5)
9 WITHIN GROUP ( ORDER BY forecastqty)
10 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_DISC
11 FROM
12 t ;

ITEM REGION WK FORECASTQTY PERCENTILE_CONT MEDIAN PERCENTILE_DISC
---- ---------- ---------- ----------- --------------- ---------- ---------------
TEST E 2 10 41 41 41
TEST E 1 41 41 41 41
TEST E 3 326 41 41 41

As you can see there are no difference between the values of PERCENTILE_CONT , PERCENTILE_DISC and MEDIAN. The computation involves the ordering of the result set ( 10 , 41 , 326 ) and picking up the middle value ( 41 ) .

Let us at look at the second example.

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> DROP
2 TABLE t purge;

Table dropped.

SQL> CREATE
2 TABLE t AS
3 SELECT
4 'TEST' item ,
5 'E' AS region ,
6 level AS wk ,
7 ROUND(dbms_random.value(1,500)) forecastqty
8 FROM
9 dual
10 CONNECT BY level <= 4 ;

Table created.

SQL> column region format a10
SQL> set linesize 800
SQL> select * from t order by forecastqty;

ITEM REGION WK FORECASTQTY
---- ---------- ---------- -----------
TEST E 3 137
TEST E 2 190
TEST E 1 232
TEST E 4 400

SQL> SELECT
2 t.* ,
3 PERCENTILE_CONT(0.5)
4 WITHIN GROUP ( ORDER BY forecastqty)
5 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_CONT ,
6 MEDIAN(forecastqty)
7 OVER (PARTITION BY ITEM , region ) AS MEDIAN ,
8 PERCENTILE_DISC(0.5)
9 WITHIN GROUP ( ORDER BY forecastqty)
10 OVER (PARTITION BY ITEM , region ) AS PERCENTILE_DISC
11 FROM
12 t ;

ITEM REGION WK FORECASTQTY PERCENTILE_CONT MEDIAN PERCENTILE_DISC
---- ---------- ---------- ----------- --------------- ---------- ---------------
TEST E 3 137 211 211 190
TEST E 2 190 211 211 190
TEST E 1 232 211 211 190
TEST E 4 400 211 211 190

There are differences between the values of PERCENTILE_CONT , PERCENTILE_DISC .

The computation of the PERCENTILE_CONT and MEDIAN involves the ordering of the result set ( 137 , 190 , 232 , 400 ) , aggregates the middle values ( 190 , 232 ) and then makes the mean / average of the aggregate [ ( 190 + 232 ) / 2 ] .

The computation of the PERCENTILE_DISC involves the ordering of the result set ( 137 , 190 , 232 , 400 ) , pickups the first of the middle values ( 190 , in this case ) . In other words , it picks up the value where cume_dist is lowest of the middle values.

PERCENTILE_CONT and MEDIAN is not guaranteed to return one of the value in the distribution . This may not be acceptable for certain business process . In that case , we would use PERCENTILE_DISC.

Note :
SQL Server 2012 introduced these functions as well.

Comments welcome.

Thursday, September 13, 2012

Using MODEL clause to extrapolate values.

I was given interesting assignment from one of my good friend .

Here are the details .

a) There is a forecast table that has forecast for 8 weeks per item and region .
b) Need to extrapolate additional 5 more weeks for these records.
   Here is the algorithm for computing 5 more weeks of forecast .

   F(9) = median ( F[6] , F[7] , F[8])
   F(10) = median ( F[7] , F[8] , F[9])
   F(11) = median ( F[8] , F[9] , F[10])
   F(12) = median ( F[9] , F[10] , F[11])
   F(13) = median ( F[10] , F[11] , F[12])

Note : F(X) is the forecast for Xth week.

Only the input values for F(9) are stored in the database ; rest of them has to be computed.
Being this is one of the best use case for MODEL clause . I dive into it .

Here is an example.

MODEL clause works only from Oracle 10g onwards.

Comments Welcome.

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> Drop table t purge
2 /

Table dropped.

SQL> Create table t as
2 SElect 'TEST' item , 'E' as region , level as wk , round(dbms_random.value(1,25)) forecastqty
3 from dual connect by level <= 8
4 union all
5 SElect 'TEST' item , 'W' as region , level as wk , round(dbms_random.value(1,25)) forecastqty
6 from dual connect by level <= 8
7 union all
8 SElect 'TEST' item , 'N' as region , level as wk , round(dbms_random.value(1,25)) forecastqty
9 from dual connect by level <= 8
10 union all
11 SElect 'TEST' item , 'S' as region , level as wk , round(dbms_random.value(1,25)) forecastqty
12 from dual connect by level <= 8
13 /

Table created.

SQL> desc t
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------
ITEM                                               CHAR(4)
REGION                                             CHAR(1)
WK                                                 NUMBER
FORECASTQTY                                        NUMBER

SQL> select * from t
2 /

ITEM R         WK FORECASTQTY
---- - ---------- -----------
TEST E          1          16
TEST E          2          12
TEST E          3          24
TEST E          4          13
TEST E          5          21
TEST E          6          24
TEST E          7           7
TEST E          8          22
TEST W          1           7
TEST W          2          10
TEST W          3           3
TEST W          4           8
TEST W          5          23
TEST W          6           2
TEST W          7           5
TEST W          8           3
TEST N          1           6
TEST N          2          14
TEST N          3          15
TEST N          4          13
TEST N          5           2
TEST N          6          21
TEST N          7          15
TEST N          8          12
TEST S          1          21
TEST S          2          12
TEST S          3           9
TEST S          4          24
TEST S          5          18
TEST S          6          20
TEST S          7          14
TEST S          8          17

SQL> SELECT
2    item ,
3    region ,
4    wk ,
5    forecastqty
6 FROM
7    t
8 model
9 IGNORE NAV
10 partition BY( item ,region ) dimension BY( wk )
11 measures( forecastqty )
12 rules upsert
13 (
14    forecastqty[9]                        = median( forecastqty ) [ wk BETWEEN 6 AND 8] ,
15    forecastqty[10]                       = median( forecastqty ) [ wk BETWEEN 7 AND 9] ,
16    forecastqty[11]                       = median( forecastqty ) [ wk BETWEEN 8 AND 10] ,
17    forecastqty[12]                       = median( forecastqty ) [ wk BETWEEN 9 AND 11] ,
18    forecastqty[13]                       = median( forecastqty ) [ wk BETWEEN 10 AND 12]
19 )
20 /

ITEM R         WK FORECASTQTY
---- - ---------- -----------
TEST W          1           7
TEST W          2          10
TEST W          3           3
TEST W          4           8
TEST W          5          23
TEST W          6           2
TEST W          7           5
TEST W          8           3
TEST S          1          21
TEST S          2          12
TEST S          3           9
TEST S          4          24
TEST S          5          18
TEST S          6          20
TEST S          7          14
TEST S          8          17
TEST S          9          17
TEST S         10          17
TEST S         11          17
TEST S         12          17
TEST S         13          17
TEST W          9           3
TEST W         10           3
TEST W         11           3
TEST W         12           3
TEST W         13           3
TEST E          1          16
TEST E          2          12
TEST E          3          24
TEST E          4          13
TEST E          5          21
TEST E          6          24
TEST E          7           7
TEST E          8          22
TEST N          1           6
TEST N          2          14
TEST N          3          15
TEST N          4          13
TEST N          5           2
TEST N          6          21
TEST N          7          15
TEST N          8          12
TEST E          9          22
TEST E         10          22
TEST E         11          22
TEST E         12          22
TEST E         13          22
TEST N          9          15
TEST N         10          15
TEST N         11          15
TEST N         12          15
TEST N         13          15

52 rows selected.

SQL> select * from t where item ='TEST' and region = 'N'
2 /

ITEM R         WK FORECASTQTY
---- - ---------- -----------
TEST N          1           6
TEST N          2          14
TEST N          3          15
TEST N          4          13
TEST N          5           2
TEST N          6          21
TEST N          7          15
TEST N          8          12

8 rows selected.

SQL> SELECT
2    item ,
3    region ,
4    wk ,
5    forecastqty
6 FROM
7    t
8 where item ='TEST' and region = 'N'
9 model
10 IGNORE NAV
11 partition BY( item ,region ) dimension BY( wk )
12 measures( forecastqty )
13 rules upsert
14 (
15    forecastqty[9]                        = median( forecastqty ) [ wk BETWEEN 6 AND 8] ,
16    forecastqty[10]                       = median( forecastqty ) [ wk BETWEEN 7 AND 9] ,
17    forecastqty[11]                       = median( forecastqty ) [ wk BETWEEN 8 AND 10] ,
18    forecastqty[12]                       = median( forecastqty ) [ wk BETWEEN 9 AND 11] ,
19    forecastqty[13]                       = median( forecastqty ) [ wk BETWEEN 10 AND 12]
20 )
21 /

ITEM R         WK FORECASTQTY
---- - ---------- -----------
TEST N          1           6
TEST N          2          14
TEST N          3          15
TEST N          4          13
TEST N          5           2
TEST N          6          21
TEST N          7          15
TEST N          8          12
TEST N          9          15
TEST N         10          15
TEST N         11          15
TEST N         12          15
TEST N         13          15

13 rows selected.

SQL>