Tuesday, October 23, 2018

Windowing Clause ... Why it matters.




Windowing clause in Analytical ( Oracle ) / Windows functions ( SQL Server ) plays an important role of how the results are rendered.  This can be better explained with the following use case. 


Here is an use case. 

A employee may have multiple email addresses in a table. The requirement was to display the first non null email address . 

Create table tab_t 
( id int , 
  lname varchar2(20) ,
  fname varchar2(20) ,
  start_dt date , 
  end_dt date , 
  sal number(10,2) , 
  email varchar2(20) 
  ) ;

 insert into tab_t values ( 1 , 'LA1' , 'FI1' , to_date('10-01-1992' , 'MM-DD-YYYY') , null , 1000.00 , 'fn1@email.com');
 insert into tab_t values ( 1 , 'LA1' , 'FI1' , to_date('12-24-2000' , 'MM-DD-YYYY') , to_date('02-15-2002' , 'MM-DD-YYYY')  , 7.25 , null ); 
 insert into tab_t values ( 1 , 'LA1' , 'FI1' , to_date('05-16-1999' , 'MM-DD-YYYY') ,  to_date('11-15-2000' , 'MM-DD-YYYY') , 5.25 , null );
insert into tab_t values ( 2 , 'LA2' , 'FI2' , to_date('01-01-1993' , 'MM-DD-YYYY') , to_date('01-01-1995' , 'MM-DD-YYYY'), 75000.00 , 'fn2@email.com');
insert into tab_t values ( 2 , 'LA2' , 'FI2' , to_date('12-24-2000' , 'MM-DD-YYYY') , to_date('02-15-2002' , 'MM-DD-YYYY')  , 7.25 , 'fn2@newemail.com' ); 
insert into tab_t values ( 3 , 'LA3' , 'FI3' , to_date('07-02-1970' , 'MM-DD-YYYY') , to_date('02-04-1980' , 'MM-DD-YYYY')  , 3.25  , null );
insert into tab_t values ( 3 , 'LA3' , 'FI3' , to_date('02-05-1980' , 'MM-DD-YYYY') , to_date('02-04-1980' , 'MM-DD-YYYY')  , 10250.00  , 'myfirst@email.com' );
insert into tab_t values ( 4 , 'LA4' , 'FI4' , to_date('11-01-2009' , 'MM-DD-YYYY') , null , 90000  , 'mywork@email.com' );


SQL> Select * from tab_t;


Here is the SQL to list the first non null email address and we will filter the results where rn =1 in the second image below .

Listing 2: 

SELECT
     t.*,
     FIRST_VALUE(email) IGNORE NULLS OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
     ) as  fv_email,
     ROW_NUMBER() OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
     ) as rn 
 FROM
     tab_t t




Listing 3: 
Select * from 
(
SELECT
     t.*,
     FIRST_VALUE(email) IGNORE NULLS OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
     ) as  fv_email,
     ROW_NUMBER() OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
     ) as rn 
 FROM
     tab_t t
Where rn = 1



Looks like , the SQL works fine. 

Let us add few more records , where the first active record has null email address. 

  insert into tab_t values ( 5 , 'LA5' , 'FI5' , to_date('10-01-1992' , 'MM-DD-YYYY') , null , 1000.00 , null);
  insert into tab_t values ( 5 , 'LA5' , 'FI5' , to_date('12-24-2000' , 'MM-DD-YYYY') , to_date('02-15-2002' , 'MM-DD-YYYY')  , 7.25 , 'email.com' ); 
  insert into tab_t values ( 5 , 'LA5' , 'FI5' , to_date('05-16-1999' , 'MM-DD-YYYY') ,  to_date('11-15-2000' , 'MM-DD-YYYY') , 5.25 , null );


Let us execute the SQL in Listing 2 and see the results.


Here is the result after executing the SQL in Listing 3. 



Obviously , our SQL did not address all the scenarios . We need the email.com as the first non null email address for Employee 5.  This is because , how the "window" is defined . By default , the window is from "unbounded rows preceding" to the "CURRENT ROW" .  Let us change the "Window" to "UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING" as seen below. 

Listing 3:

SELECT
     t.*,
     FIRST_VALUE(email) IGNORE NULLS OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
             RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
     ) as  fv_email,
     ROW_NUMBER() OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
     ) as rn 
 FROM
     tab_t t





Listing 4: 
SQL with filter only the first record.

Select * from 
(
SELECT
     t.*,
     FIRST_VALUE(email) IGNORE NULLS OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
             RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
     ) as  fv_email,
     ROW_NUMBER() OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
     ) as rn 
 FROM
     tab_t t
Where rn = 1



Now , with changing the "window " , we are able to get the correct results. 

Comments welcome.