Tuesday, December 18, 2018

GROUP BY behavior in MySQL / MariaDB




It is better to unlearn and relearn the SQL , if we need to work on different DBMS. I was surprised by the "GROUP BY" behavior in MySQL / MariaDB. 

Typically , when we are using GROUP BY , all the columns except the aggregated  columns mentioned in the SELECT clause has to be defined in the GROUP BY clause. If any columns are missed , then the prompt produces an error .  However , in MySQL / MariaDB , this produces the result ( sum / avg ) . This may be a side effect ,which may not be the desirable result , we are looking for. 

Let us create a simple table in Oracle , MS SQL Server , PostgreSQL , MySQL and MariaDB.When I issue the GROUP BY without mentioning other columns , all the DBMSes except MariaDB and MySQL produced an error. 

Here is an example 

ORACLE 

SQL> Create table emp_hours
  2  ( empid int ,
  3    workweek int ,
  4    hrstype varchar(10) ,
  5    hrs number(5,2)
  6    ) ;

Table created.

SQL> insert into    emp_hours values  ( 1 , 20 , 'REG' , 20 ) ;
1 row created.

SQL> insert into    emp_hours values( 1 , 20 , 'OT' , 4.5 ) ;
1 row created.

SQL> insert into    emp_hours values ( 2 , 20 , 'REG' , 40 ) ;
1 row created.

SQL> insert into    emp_hours values( 2 , 20 , 'REG' , 35 ) ;
1 row created.

SQL> insert into    emp_hours values ( 3 , 20 , 'OT' , 4.5 ) ;
1 row created.

SQL> commit;
Commit complete.

SQL> select empid , hrstype , sum(hrs) from emp_hours;
select empid , hrstype , sum(hrs) from emp_hours
       *
ERROR at line 1:
ORA-00937: not a single-group group function

SQL> select empid , hrstype , sum(hrs) from emp_hours group by empid , hrstype ;

     EMPID HRSTYPE      SUM(HRS)
---------- ---------- ----------
         3 OT                4.5
         1 REG                20
         2 REG                75
         1 OT                4.5

MS SQL Server


1> Create table emp_hours
2> ( empid int ,
3>   workweek int ,
4>   hrstype varchar(10) ,
5>   hrs numeric(5,2)
6>   ) ;
7> go
1> insert into    emp_hours values  ( 1 , 20 , 'REG' , 20 ) ;
2> insert into    emp_hours values( 1 , 20 , 'OT' , 4.5 ) ;
3> insert into    emp_hours values ( 2 , 20 , 'REG' , 40 ) ;
4> insert into    emp_hours values( 2 , 20 , 'REG' , 35 ) ;
5> insert into    emp_hours values ( 3 , 20 , 'OT' , 4.5 ) ;
6> go

(1 rows affected)

(1 rows affected)

(1 rows affected)

(1 rows affected)

(1 rows affected)  

1> select empid , hrstype , sum(hrs) from emp_hours;
2> go
Msg 8120, Level 16, State 1, Server AUSTIN02, Line 1
Column 'emp_hours.empid' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

1> select empid , hrstype , sum(hrs) from emp_hours group by empid , hrstype ;
2> go
empid hrstype
----- ------- -
1 OT 4.50
3 OT 4.50
1 REG 20.00
2 REG 75.00

(4 rows affected)       

PostgreSQL 

Similarly , we get the error as shown below in Postgres for the above test case.

ERROR: column "emp_hours.empid" must appear in the GROUP BY clause or be used in an aggregate function
SQL state: 42803

Character: 8

MySQL 

mysql> select version();
+-------------------------+
| version()               |
+-------------------------+
| 5.5.49-0ubuntu0.14.04.1 |
+-------------------------+
1 row in set (0.00 sec)



mysql> select empid , hrstype , sum(hrs) from emp_hours;
+-------+---------+----------+
| empid | hrstype | sum(hrs) |
+-------+---------+----------+
|     1 | REG     |   104.00 |
+-------+---------+----------+
1 row in set (0.00 sec)

Instead of a error , the above SQL summed up the hours and produced one record.  

Personally , I would prefer the above SQL to error out , instead of giving this answer. 

mysql> select empid , hrstype , sum(hrs) from emp_hours group by empid , hrstype ;
+-------+---------+----------+
| empid | hrstype | sum(hrs) |
+-------+---------+----------+
|     1 | OT      |     4.50 |
|     1 | REG     |    20.00 |
|     2 | REG     |    75.00 |
|     3 | OT      |     4.50 |
+-------+---------+----------+
4 rows in set (0.05 sec)


MariaDB

mysql> select version();
+--------------------------------------------+
| version()                                  |
+--------------------------------------------+
| 10.3.11-MariaDB-1:10.3.11+maria~bionic-log |
+--------------------------------------------+
1 row in set (0.00 sec)


mysql> select empid , hrstype , sum(hrs) from emp_hours;
+-------+---------+----------+
| empid | hrstype | sum(hrs) |
+-------+---------+----------+
|     1 | REG     |   104.00 |
+-------+---------+----------+
1 row in set (0.00 sec)


It is better , not to assume , the behavior of the underlying DBMS.