Wednesday, September 19, 2012

Difference between PERCENTILE_CONT and PERCENTILE_DISC

Treading on the previous post on MEDIAN , I would like to point out the Difference between PERCENTILE_CONT and  PERCENTILE_DISC( MEDIAN being special case of PERCENTILE_CONT) . 

There are subtle differences between PERCENTILE_CONT and  PERCENTILE_DISC.  When the count of record set is odd , there are no differences between them .  The calculation of MEDIAN is identical to PERCENTILE_CONT when computed with percentile value of  0.5

Let us look at the first example.


SQL> DROP
  2    TABLE t purge;

Table dropped.

SQL> CREATE
  2    TABLE t AS
  3  SELECT
  4    'TEST' item ,
  5    'E'   AS region ,
  6    level AS wk ,
  7    ROUND(dbms_random.value(1,500)) forecastqty
  8  FROM
  9    dual
 10    CONNECT BY level <= 3 ;

Table created.


SQL>     SELECT
  2        t.* ,
  3        PERCENTILE_CONT(0.5)
  4        WITHIN GROUP ( ORDER BY forecastqty)
  5        OVER (PARTITION BY ITEM , region ) AS PERCENTILE_CONT ,
  6        MEDIAN(forecastqty)
  7        OVER (PARTITION BY ITEM , region ) AS MEDIAN ,
  8        PERCENTILE_DISC(0.5)
  9        WITHIN GROUP ( ORDER BY forecastqty)
 10        OVER (PARTITION BY ITEM , region ) AS PERCENTILE_DISC
 11      FROM
 12    t ;

ITEM REGION             WK FORECASTQTY PERCENTILE_CONT     MEDIAN PERCENTILE_DISC
---- ---------- ---------- ----------- --------------- ---------- ---------------
TEST E                   2          10              41         41              41
TEST E                   1          41              41         41              41
TEST E                   3         326              41         41              41




As you can see there are no difference between the values of PERCENTILE_CONT , PERCENTILE_DISC and MEDIAN. The computation involves the ordering of the result set ( 10 , 41 , 326 ) and picking up the middle value ( 41 ) . 


Let us at look at the second example.



SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production




SQL> DROP
  2    TABLE t purge;

Table dropped.

SQL> CREATE
  2    TABLE t AS
  3  SELECT
  4    'TEST' item ,
  5    'E'   AS region ,
  6    level AS wk ,
  7    ROUND(dbms_random.value(1,500)) forecastqty
  8  FROM
  9    dual
 10    CONNECT BY level <= 4 ;

Table created.

SQL>   column region format a10
SQL>   set linesize 800
SQL> select * from t order by forecastqty;

ITEM REGION             WK FORECASTQTY
---- ---------- ---------- -----------
TEST E                   3         137
TEST E                   2         190
TEST E                   1         232
TEST E                   4         400



SQL>     SELECT

  2        t.* ,
  3        PERCENTILE_CONT(0.5)
  4        WITHIN GROUP ( ORDER BY forecastqty)
  5        OVER (PARTITION BY ITEM , region ) AS PERCENTILE_CONT ,
  6        MEDIAN(forecastqty)
  7        OVER (PARTITION BY ITEM , region ) AS MEDIAN ,
  8        PERCENTILE_DISC(0.5)
  9        WITHIN GROUP ( ORDER BY forecastqty)
 10        OVER (PARTITION BY ITEM , region ) AS PERCENTILE_DISC
 11      FROM
 12    t ;



ITEM REGION             WK FORECASTQTY PERCENTILE_CONT     MEDIAN PERCENTILE_DISC
---- ---------- ---------- ----------- --------------- ---------- ---------------
TEST E                   3         137             211        211             190
TEST E                   2         190             211        211             190
TEST E                   1         232             211        211             190
TEST E                   4         400             211        211             190



There are differences between the values of PERCENTILE_CONT , PERCENTILE_DISC . 
The computation of the  PERCENTILE_CONT   and MEDIAN involves the ordering of the result set ( 137 , 190 , 232 , 400  ) , aggregates the middle values ( 190 , 232 ) and then makes the mean / average of the aggregate [ ( 190 + 232 ) / 2  ] .  


The computation of the   PERCENTILE_DISC   involves the ordering of the result set ( 137 , 190 , 232 , 400  ) ,  pickups the first of the middle values ( 190 , in this case ) .  In other words , it picks up the value where cume_dist is lowest of the middle values. 

PERCENTILE_CONT and MEDIAN is not guaranteed to return one of the value in the distribution  . This may not be acceptable for certain business process . In that case , we would use PERCENTILE_DISC.


Note : 
SQL Server 2012 introduced these functions as well.  

Comments welcome.






Thursday, September 13, 2012

Using MODEL clause to extrapolate values.


I was given interesting assignment from one of my good friend .

Here are the details .

a) There is a forecast table that has forecast for 8 weeks per item and region .
b) Need to extrapolate additional 5 more weeks for these records.
   Here is the algorithm for computing 5 more weeks of forecast .
  
   F(9) = median ( F[6] , F[7] , F[8])
   F(10) = median ( F[7] , F[8] , F[9])
   F(11) = median ( F[8] , F[9] , F[10])
   F(12) = median ( F[9] , F[10] , F[11])
   F(13) = median ( F[10] , F[11] , F[12])

  

Note : F(X) is the forecast for Xth week.   
  
Only the  input values for  F(9)  are stored in the database ; rest of them has to be computed.
Being this is one of the best use case for MODEL clause . I dive into it .
  
Here is an example.

MODEL clause works only from Oracle 10g onwards.

Comments Welcome.

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for 64-bit Windows: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production


SQL> Drop table t purge
  2  /

Table dropped.




SQL> Create table t as
  2  SElect 'TEST'  item , 'E' as region ,  level  as wk ,  round(dbms_random.value(1,25))  forecastqty
  3  from dual connect by level <= 8
  4  union all
  5  SElect 'TEST'  item , 'W' as region ,  level  as wk ,  round(dbms_random.value(1,25))  forecastqty
  6  from dual connect by level <= 8
  7  union all
  8  SElect 'TEST'  item , 'N' as region ,  level  as wk ,  round(dbms_random.value(1,25))  forecastqty
  9  from dual connect by level <= 8
 10  union all
 11  SElect 'TEST'  item , 'S' as region ,  level  as wk ,  round(dbms_random.value(1,25))  forecastqty
 12  from dual connect by level <= 8
 13  /

Table created.


SQL> desc t
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ITEM                                               CHAR(4)
 REGION                                             CHAR(1)
 WK                                                 NUMBER
 FORECASTQTY                                        NUMBER


SQL> select * from t
  2  /

ITEM R         WK FORECASTQTY
---- - ---------- -----------
TEST E          1          16
TEST E          2          12
TEST E          3          24
TEST E          4          13
TEST E          5          21
TEST E          6          24
TEST E          7           7
TEST E          8          22
TEST W          1           7
TEST W          2          10
TEST W          3           3
TEST W          4           8
TEST W          5          23
TEST W          6           2
TEST W          7           5
TEST W          8           3
TEST N          1           6
TEST N          2          14
TEST N          3          15
TEST N          4          13
TEST N          5           2
TEST N          6          21
TEST N          7          15
TEST N          8          12
TEST S          1          21
TEST S          2          12
TEST S          3           9
TEST S          4          24
TEST S          5          18
TEST S          6          20
TEST S          7          14
TEST S          8          17

SQL> SELECT
  2    item ,
  3    region ,
  4    wk ,
  5    forecastqty
  6  FROM
  7    t
  8  model
  9  IGNORE NAV
 10  partition BY( item ,region ) dimension BY( wk )
 11  measures( forecastqty  )
 12  rules upsert
 13  (
 14    forecastqty[9]                        = median( forecastqty ) [ wk BETWEEN 6 AND 8] ,
 15    forecastqty[10]                       = median( forecastqty ) [ wk BETWEEN 7 AND 9] ,
 16    forecastqty[11]                       = median( forecastqty ) [ wk BETWEEN 8 AND 10] ,
 17    forecastqty[12]                       = median( forecastqty ) [ wk BETWEEN 9 AND 11] ,
 18    forecastqty[13]                       = median( forecastqty ) [ wk BETWEEN 10 AND 12]
 19  )
 20  /

ITEM R         WK FORECASTQTY
---- - ---------- -----------
TEST W          1           7
TEST W          2          10
TEST W          3           3
TEST W          4           8
TEST W          5          23
TEST W          6           2
TEST W          7           5
TEST W          8           3
TEST S          1          21
TEST S          2          12
TEST S          3           9
TEST S          4          24
TEST S          5          18
TEST S          6          20
TEST S          7          14
TEST S          8          17
TEST S          9          17
TEST S         10          17
TEST S         11          17
TEST S         12          17
TEST S         13          17
TEST W          9           3
TEST W         10           3
TEST W         11           3
TEST W         12           3
TEST W         13           3
TEST E          1          16
TEST E          2          12
TEST E          3          24
TEST E          4          13
TEST E          5          21
TEST E          6          24
TEST E          7           7
TEST E          8          22
TEST N          1           6
TEST N          2          14
TEST N          3          15
TEST N          4          13
TEST N          5           2
TEST N          6          21
TEST N          7          15
TEST N          8          12
TEST E          9          22
TEST E         10          22
TEST E         11          22
TEST E         12          22
TEST E         13          22
TEST N          9          15
TEST N         10          15
TEST N         11          15
TEST N         12          15
TEST N         13          15

52 rows selected.



SQL> select * from t where item ='TEST' and region = 'N'
  2  /

ITEM R         WK FORECASTQTY
---- - ---------- -----------
TEST N          1           6
TEST N          2          14
TEST N          3          15
TEST N          4          13
TEST N          5           2
TEST N          6          21
TEST N          7          15
TEST N          8          12

8 rows selected.

SQL> SELECT
  2    item ,
  3    region ,
  4    wk ,
  5    forecastqty
  6  FROM
  7    t
  8  where item ='TEST' and region = 'N'
  9  model
 10  IGNORE NAV
 11  partition BY( item ,region ) dimension BY( wk )
 12  measures( forecastqty  )
 13  rules upsert
 14  (
 15    forecastqty[9]                        = median( forecastqty ) [ wk BETWEEN 6 AND 8] ,
 16    forecastqty[10]                       = median( forecastqty ) [ wk BETWEEN 7 AND 9] ,
 17    forecastqty[11]                       = median( forecastqty ) [ wk BETWEEN 8 AND 10] ,
 18    forecastqty[12]                       = median( forecastqty ) [ wk BETWEEN 9 AND 11] ,
 19    forecastqty[13]                       = median( forecastqty ) [ wk BETWEEN 10 AND 12]

 20  )
 21  /

ITEM R         WK FORECASTQTY
---- - ---------- -----------
TEST N          1           6
TEST N          2          14
TEST N          3          15
TEST N          4          13
TEST N          5           2
TEST N          6          21
TEST N          7          15
TEST N          8          12
TEST N          9          15
TEST N         10          15
TEST N         11          15
TEST N         12          15
TEST N         13          15

13 rows selected.

SQL>