Tuesday, December 18, 2018

GROUP BY behavior in MySQL / MariaDB




It is better to unlearn and relearn the SQL , if we need to work on different DBMS. I was surprised by the "GROUP BY" behavior in MySQL / MariaDB. 

Typically , when we are using GROUP BY , all the columns except the aggregated  columns mentioned in the SELECT clause has to be defined in the GROUP BY clause. If any columns are missed , then the prompt produces an error .  However , in MySQL / MariaDB , this produces the result ( sum / avg ) . This may be a side effect ,which may not be the desirable result , we are looking for. 

Let us create a simple table in Oracle , MS SQL Server , PostgreSQL , MySQL and MariaDB.When I issue the GROUP BY without mentioning other columns , all the DBMSes except MariaDB and MySQL produced an error. 

Here is an example 

ORACLE 

SQL> Create table emp_hours
  2  ( empid int ,
  3    workweek int ,
  4    hrstype varchar(10) ,
  5    hrs number(5,2)
  6    ) ;

Table created.

SQL> insert into    emp_hours values  ( 1 , 20 , 'REG' , 20 ) ;
1 row created.

SQL> insert into    emp_hours values( 1 , 20 , 'OT' , 4.5 ) ;
1 row created.

SQL> insert into    emp_hours values ( 2 , 20 , 'REG' , 40 ) ;
1 row created.

SQL> insert into    emp_hours values( 2 , 20 , 'REG' , 35 ) ;
1 row created.

SQL> insert into    emp_hours values ( 3 , 20 , 'OT' , 4.5 ) ;
1 row created.

SQL> commit;
Commit complete.

SQL> select empid , hrstype , sum(hrs) from emp_hours;
select empid , hrstype , sum(hrs) from emp_hours
       *
ERROR at line 1:
ORA-00937: not a single-group group function

SQL> select empid , hrstype , sum(hrs) from emp_hours group by empid , hrstype ;

     EMPID HRSTYPE      SUM(HRS)
---------- ---------- ----------
         3 OT                4.5
         1 REG                20
         2 REG                75
         1 OT                4.5

MS SQL Server


1> Create table emp_hours
2> ( empid int ,
3>   workweek int ,
4>   hrstype varchar(10) ,
5>   hrs numeric(5,2)
6>   ) ;
7> go
1> insert into    emp_hours values  ( 1 , 20 , 'REG' , 20 ) ;
2> insert into    emp_hours values( 1 , 20 , 'OT' , 4.5 ) ;
3> insert into    emp_hours values ( 2 , 20 , 'REG' , 40 ) ;
4> insert into    emp_hours values( 2 , 20 , 'REG' , 35 ) ;
5> insert into    emp_hours values ( 3 , 20 , 'OT' , 4.5 ) ;
6> go

(1 rows affected)

(1 rows affected)

(1 rows affected)

(1 rows affected)

(1 rows affected)  

1> select empid , hrstype , sum(hrs) from emp_hours;
2> go
Msg 8120, Level 16, State 1, Server AUSTIN02, Line 1
Column 'emp_hours.empid' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

1> select empid , hrstype , sum(hrs) from emp_hours group by empid , hrstype ;
2> go
empid hrstype
----- ------- -
1 OT 4.50
3 OT 4.50
1 REG 20.00
2 REG 75.00

(4 rows affected)       

PostgreSQL 

Similarly , we get the error as shown below in Postgres for the above test case.

ERROR: column "emp_hours.empid" must appear in the GROUP BY clause or be used in an aggregate function
SQL state: 42803

Character: 8

MySQL 

mysql> select version();
+-------------------------+
| version()               |
+-------------------------+
| 5.5.49-0ubuntu0.14.04.1 |
+-------------------------+
1 row in set (0.00 sec)



mysql> select empid , hrstype , sum(hrs) from emp_hours;
+-------+---------+----------+
| empid | hrstype | sum(hrs) |
+-------+---------+----------+
|     1 | REG     |   104.00 |
+-------+---------+----------+
1 row in set (0.00 sec)

Instead of a error , the above SQL summed up the hours and produced one record.  

Personally , I would prefer the above SQL to error out , instead of giving this answer. 

mysql> select empid , hrstype , sum(hrs) from emp_hours group by empid , hrstype ;
+-------+---------+----------+
| empid | hrstype | sum(hrs) |
+-------+---------+----------+
|     1 | OT      |     4.50 |
|     1 | REG     |    20.00 |
|     2 | REG     |    75.00 |
|     3 | OT      |     4.50 |
+-------+---------+----------+
4 rows in set (0.05 sec)


MariaDB

mysql> select version();
+--------------------------------------------+
| version()                                  |
+--------------------------------------------+
| 10.3.11-MariaDB-1:10.3.11+maria~bionic-log |
+--------------------------------------------+
1 row in set (0.00 sec)


mysql> select empid , hrstype , sum(hrs) from emp_hours;
+-------+---------+----------+
| empid | hrstype | sum(hrs) |
+-------+---------+----------+
|     1 | REG     |   104.00 |
+-------+---------+----------+
1 row in set (0.00 sec)


It is better , not to assume , the behavior of the underlying DBMS. 


Tuesday, October 23, 2018

Windowing Clause ... Why it matters.




Windowing clause in Analytical ( Oracle ) / Windows functions ( SQL Server ) plays an important role of how the results are rendered.  This can be better explained with the following use case. 


Here is an use case. 

A employee may have multiple email addresses in a table. The requirement was to display the first non null email address . 

Create table tab_t 
( id int , 
  lname varchar2(20) ,
  fname varchar2(20) ,
  start_dt date , 
  end_dt date , 
  sal number(10,2) , 
  email varchar2(20) 
  ) ;

 insert into tab_t values ( 1 , 'LA1' , 'FI1' , to_date('10-01-1992' , 'MM-DD-YYYY') , null , 1000.00 , 'fn1@email.com');
 insert into tab_t values ( 1 , 'LA1' , 'FI1' , to_date('12-24-2000' , 'MM-DD-YYYY') , to_date('02-15-2002' , 'MM-DD-YYYY')  , 7.25 , null ); 
 insert into tab_t values ( 1 , 'LA1' , 'FI1' , to_date('05-16-1999' , 'MM-DD-YYYY') ,  to_date('11-15-2000' , 'MM-DD-YYYY') , 5.25 , null );
insert into tab_t values ( 2 , 'LA2' , 'FI2' , to_date('01-01-1993' , 'MM-DD-YYYY') , to_date('01-01-1995' , 'MM-DD-YYYY'), 75000.00 , 'fn2@email.com');
insert into tab_t values ( 2 , 'LA2' , 'FI2' , to_date('12-24-2000' , 'MM-DD-YYYY') , to_date('02-15-2002' , 'MM-DD-YYYY')  , 7.25 , 'fn2@newemail.com' ); 
insert into tab_t values ( 3 , 'LA3' , 'FI3' , to_date('07-02-1970' , 'MM-DD-YYYY') , to_date('02-04-1980' , 'MM-DD-YYYY')  , 3.25  , null );
insert into tab_t values ( 3 , 'LA3' , 'FI3' , to_date('02-05-1980' , 'MM-DD-YYYY') , to_date('02-04-1980' , 'MM-DD-YYYY')  , 10250.00  , 'myfirst@email.com' );
insert into tab_t values ( 4 , 'LA4' , 'FI4' , to_date('11-01-2009' , 'MM-DD-YYYY') , null , 90000  , 'mywork@email.com' );


SQL> Select * from tab_t;


Here is the SQL to list the first non null email address and we will filter the results where rn =1 in the second image below .

Listing 2: 

SELECT
     t.*,
     FIRST_VALUE(email) IGNORE NULLS OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
     ) as  fv_email,
     ROW_NUMBER() OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
     ) as rn 
 FROM
     tab_t t




Listing 3: 
Select * from 
(
SELECT
     t.*,
     FIRST_VALUE(email) IGNORE NULLS OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
     ) as  fv_email,
     ROW_NUMBER() OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
     ) as rn 
 FROM
     tab_t t
Where rn = 1



Looks like , the SQL works fine. 

Let us add few more records , where the first active record has null email address. 

  insert into tab_t values ( 5 , 'LA5' , 'FI5' , to_date('10-01-1992' , 'MM-DD-YYYY') , null , 1000.00 , null);
  insert into tab_t values ( 5 , 'LA5' , 'FI5' , to_date('12-24-2000' , 'MM-DD-YYYY') , to_date('02-15-2002' , 'MM-DD-YYYY')  , 7.25 , 'email.com' ); 
  insert into tab_t values ( 5 , 'LA5' , 'FI5' , to_date('05-16-1999' , 'MM-DD-YYYY') ,  to_date('11-15-2000' , 'MM-DD-YYYY') , 5.25 , null );


Let us execute the SQL in Listing 2 and see the results.


Here is the result after executing the SQL in Listing 3. 



Obviously , our SQL did not address all the scenarios . We need the email.com as the first non null email address for Employee 5.  This is because , how the "window" is defined . By default , the window is from "unbounded rows preceding" to the "CURRENT ROW" .  Let us change the "Window" to "UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING" as seen below. 

Listing 3:

SELECT
     t.*,
     FIRST_VALUE(email) IGNORE NULLS OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
             RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
     ) as  fv_email,
     ROW_NUMBER() OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
     ) as rn 
 FROM
     tab_t t





Listing 4: 
SQL with filter only the first record.

Select * from 
(
SELECT
     t.*,
     FIRST_VALUE(email) IGNORE NULLS OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
             RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
     ) as  fv_email,
     ROW_NUMBER() OVER(
         PARTITION BY t.id
         ORDER BY
             coalesce(end_dt,TO_DATE('01-JAN-2050') ) DESC,start_dt DESC
     ) as rn 
 FROM
     tab_t t
Where rn = 1



Now , with changing the "window " , we are able to get the correct results. 

Comments welcome. 




Monday, September 17, 2018

Schema only user accounts in Oracle





Prior to Oracle 18c ,  there is no difference between user and schema . It has been used interchangeably . 

SQL Server has introduced the concept of "SCHEMA" only accounts quite a while ago. 
Now , Oracle 18c ( aka 12c R2) has this new feature . 

We can create a schema only account with the clause " no authentication".  Once this schema gets created , the objects can be accessed by either proxy users , other users . 

Prior to this , typically , I create a user with no "create session" privilege ; store only the persistent information in that user . Then there will be another user that does the data processing and the logic  is exposed as an API.   

With 18c and " no authentication" clause , it is easier from management/administration  perspective.

Example shown below 


SQL> create user sales  no authentication ;

User created.

SQL> alter user sales quota unlimited on users;

User altered.


SQL> alter user sales default tablespace users;

User altered.

SQL> create table sales.employees ( empname char(50)) ;

Table created.

SQL> create user webappuser identified by Pa55w0rd ;

User created.

SQL> grant select , insert , update , delete on  sales.employees to webappuser;

Grant succeeded.

SQL> grant create session to webappuser ;

Grant succeeded.

SQL> alter user webappuser default tablespace users;

User altered.


SQL> alter user webappuser quota unlimited on users;

User altered.


Now , login as webappuser .

SQL> insert into sales.employees values ('Zahir');

1 row created.

SQL> commit;

Commit complete.

SQL> show user
USER is "WEBAPPUSER"
SQL>

Thursday, April 26, 2018

Case Sensitiveness - Part 2



Oracle maintains the case sensitiveness by default.

Prior to Oracle 12c Release 2 , we can achieve the nls settings at the session level.
Please see the blog post here ( http://mfzahirdba.blogspot.com/2010/04/case-sensitiveness.html)


With Oracle 12c Release 2 , we can set it at the column level , table level and of course at the session level.Column level settings takes precedence over other settings .

Here is the example .

Let us create the table in  a traditional way . 

SQL> CREATE TABLE empl_reg (
  2          firstname   VARCHAR2(30) ,
  3          lastname    VARCHAR2(30) ,
  4      sal         INT
  5  );

Table created.

SQL> insert into empl_reg values ( 'Zahir','mohideen', 1000 ) ;

1 row created.

SQL> insert into empl_reg values ( 'zahir','mohideen', 2000 ) ;

1 row created.

SQL> insert into empl_reg values ( 'zAhir','mohIdEen', 5000 ) ;

1 row created.

SQL> insert into empl_reg  values ( 'Abdul','MohIdEen', 50000 ) ;

1 row created.

SQL> commit;

Commit complete.

SQL>  Select * from empl_reg where firstname ='zahir';

FIRSTNA LASTNAME            SAL
------- ------------ ----------
zahir   mohideen           2000

In this case , to get all records with First name as 'Zahir' , as a workaround , we typically use lower/upper function. Sometimes , we will use Function Based Index on this functions to boost the performance. 

SQL> Select * from empl_reg where lower(firstname) ='zahir';

FIRSTNA LASTNAME            SAL
------- ------------ ----------
Zahir   mohideen           1000
zahir   mohideen           2000
zAhir   mohIdEen           5000


Now , let us create the table with columns - collation case insensitive ( ci) way .

SQL> Create table emp
  2  ( firstname varchar2(30) collate binary_ci ,
  3    lastname varchar2(30) collate binary_ci ,
  4    sal int
  5  ) ;

Create table emp
*
ERROR at line 1:
ORA-43929: Collation cannot be specified if parameter MAX_STRING_SIZE=STANDARD is set.


You would need to set the parameter to extended at the CDB , PDB SEED  and PDBs and bounce the instance , as this is a static parameter.

This is a bit tedious workflow . Hopefully , in the next release , this process is made simpler . 


SQL> CREATE TABLE empl (
  2          firstname   VARCHAR2(30) COLLATE binary_ci,
  3          lastname    VARCHAR2(30) COLLATE binary_ci,
  4      sal         INT
  5  );

Table created.

SQL> insert into empl values ( 'Zahir','mohideen', 1000 ) ;

1 row created.

SQL> insert into empl values ( 'zahir','mohideen', 2000 ) ;

1 row created.

SQL> insert into empl values ( 'zAhir','mohIdEen', 5000 ) ;

1 row created.

SQL> insert into empl values ( 'Abdul','MohIdEen', 50000 ) ;

1 row created.

SQL> commit;

Commit complete.

SQL> Select * from empl where firstname ='zahir';

FIRSTNA LASTNAME            SAL
------- ------------ ----------
Zahir   mohideen           1000
zahir   mohideen           2000
zAhir   mohIdEen           5000 

Now , we can get all the three records no matter what the case is .
As we can see, the explain plan does nlsssort .

SQL> explain plan for Select * from empl where firstname ='zahir';

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3277272027

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    47 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMPL |     1 |    47 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - filter(NLSSORT("FIRSTNAME",'nls_sort=''BINARY_CI''')=HEXTORAW('7A
              6168697200'))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


18 rows selected.


SQL> explain plan for Select * from empl_reg where firstname ='zahir';

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1040152178

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |     1 |    47 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMPL_REG |     1 |    47 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - filter("FIRSTNAME"='zahir')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


17 rows selected.



Comments Welcome!



Wednesday, April 25, 2018

APPROX_PERCENTILE in Oracle 12cR2



In my one of my previous blog post (http://mfzahirdba.blogspot.com/2012/09/difference-between-percentilecont-and.html) , we discussed about the PERCENTILE_CONT , PERCENTILE_DISC and median . 

Beginning in 12c Release , we can get the approximate value of PERCENTILE_CONT and these related functions . 

The adavcantage is that this is MUCH faster than regular PERCENTILE_CONT functions . This is very much useful , when we don't need the exact value for the results . Examples could be analyzing the website visits , surveys , etc. 

As it can be seen from the example below , the execution time differs from the PERCENTILE_CONT / APPROX_PERCENTILE. The difference in the execution time could be significant , if the underlying data object is huge . 



SQL> CREATE
2    TABLE t AS
3  SELECT
4    'TEST' item ,
5    'E'   AS region ,
6    level AS wk ,
7    ROUND(dbms_random.value(1,500)) forecastqty
8  FROM
9    dual
10    CONNECT BY level <= 3000 ;

 Table created.

 Execution #1:

 SQL> SELECT
   2      item,
   3      region,
   4      wk,
   5      PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY forecastqty) AS fqty
   6  FROM
   7      t
   8  GROUP BY
   9      item,
  10      region,
 11      wk


 .....
 .....
 ......


 ITEM R         WK       FQTY
 ---- - ---------- ----------
 TEST E      29986        228
 TEST E      29987        329
 TEST E      29988          8
 TEST E      29989        465
 TEST E      29990        411
 TEST E      29991        484
 TEST E      29992        232
 TEST E      29993        416
 TEST E      29994        443
 TEST E      29995         95
 TEST E      29996         48

 ITEM R         WK       FQTY
 ---- - ---------- ----------
 TEST E      29997        385
 TEST E      29998        367
 TEST E      29999        433

 30000 rows selected.

Elapsed: 00:00:11.70


 Execution #2:


SQL> SELECT
  2      item,
  3      region,
  4      wk,
  5      APPROX_PERCENTILE(0.5) WITHIN GROUP(
  6      ORDER BY
  7          forecastqty
  8      )   as approx_fqty
  9  FROM
 10      t
 11  GROUP BY
 12      item,
 13      region,
 14      wk
 15/

 ......
 ....
 ......



ITEM R         WK       FQTY
---- - ---------- ----------
TEST E      29986        228
TEST E      29987        329
TEST E      29988          8
TEST E      29989        465
TEST E      29990        411
TEST E      29991        484
TEST E      29992        232
TEST E      29993        416
TEST E      29994        443
TEST E      29995         95
TEST E      29996         48

ITEM R         WK       FQTY
---- - ---------- ----------
TEST E      29997        385
TEST E      29998        367
TEST E      29999        433

30000 rows selected.

Elapsed: 00:00:09.23



In Oracle 12c Release 2, there are few other new functions for approximation. 


Also , on a similar note .... 


In Oracle 12c Release 1, APPROX_COUNT_DISTINCT was introduced to cater to these situations . 

Please see this link for the deails  

http://mfzahirdba.blogspot.com/2015/01/approxcountdistinct-new-oracle-12c.html

In Oracle 12c Release 2 , we have another parameter (APPROX_FOR_COUNT_DISTINCT)  that can be set at the session level / system level to override the "approxmiation" behavior . Once we set this , the legacy  COUNT functions in our legacy code will go for approximation .  There is no need to revisit all our code to change it to 'APPROX_COUNT_DISTINCT' Again , this can be changed , only if the business use case allows it . 


BTW , this is my 100th blog post. Comments welcome !


Wednesday, January 17, 2018

CSV output from Oracle



Prior to 12c R2 , producing CSV ouptut from a table required bit of work . 
With 12c R2 , the process is quick and easy. The good old SQLPlus has got the long awaited option . 

With the SET option  , we should be spooled to a file quicker. 

Here is an example. 

SQL> select * from emp;

ENAME                        ID
-------------------- ----------
Zahir                        10
Hameed                       20
Farook                       30
Basheer                      30


SQL> set markup csv on

SQL> select * from emp;

"ENAME","ID"
"Zahir",10
"Hameed",20
"Farook",30
"Basheer",30


By default , the delimiter for the character values will be enclosed in quotes. 
We can turn it off by the quote off. 

SQL> set markup csv on quote off
SQL> select * from emp;

ENAME,ID
Zahir,10
Hameed,20
Farook,30
Basheer,30



Sometimes , it is beneficial to a different delimiter. 
In the following example , we use '|' as the delimiter . 

SQL> set markup csv on delim '|' quote on
SQL> select * from emp;

"ENAME"|"ID"
"Zahir"|10
"Hameed"|20
"Farook"|30
"Basheer"|30


In this way , the CSV ouput from oracle tables/views is a loooooot easier. 

Comments welcome. 


Update : 
To make the example complete . We need to use spool on/off to produce a file in our directory.  In the example below , I have spool the results to a file emp.csv . We can few other options to not to display ( supress ) the header , query . To make it simpler , I have just simply spooled the file . 

Thanks very much  to my good friend , Soma for pointing out this .  


SQL> spool  emp.csv
SQL> select * from emp;

"ENAME"|"ID"
"Zahir"|10
"Hameed"|20
"Farook"|30
"Basheer"|30


SQL> spool off
SQL> host dir
 Volume in drive D has no label.
 Volume Serial Number is AXCA-3871

 Directory of D:\test

01/17/2018  03:26 PM              .
01/17/2018  03:26 PM              ..
01/17/2018  03:27 PM               123 emp.csv
               1 File(s)            123 bytes
               2 Dir(s)  710,110,654,464 bytes free

SQL> ho type emp.csv
SQL> select * from emp;

"ENAME"|"ID"
"Zahir"|10
"Hameed"|20
"Farook"|30
"Basheer"|30