Showing posts with label PostgreSQL. Show all posts
Showing posts with label PostgreSQL. Show all posts

Tuesday, December 18, 2018

GROUP BY behavior in MySQL / MariaDB




It is better to unlearn and relearn the SQL , if we need to work on different DBMS. I was surprised by the "GROUP BY" behavior in MySQL / MariaDB. 

Typically , when we are using GROUP BY , all the columns except the aggregated  columns mentioned in the SELECT clause has to be defined in the GROUP BY clause. If any columns are missed , then the prompt produces an error .  However , in MySQL / MariaDB , this produces the result ( sum / avg ) . This may be a side effect ,which may not be the desirable result , we are looking for. 

Let us create a simple table in Oracle , MS SQL Server , PostgreSQL , MySQL and MariaDB.When I issue the GROUP BY without mentioning other columns , all the DBMSes except MariaDB and MySQL produced an error. 

Here is an example 

ORACLE 

SQL> Create table emp_hours
  2  ( empid int ,
  3    workweek int ,
  4    hrstype varchar(10) ,
  5    hrs number(5,2)
  6    ) ;

Table created.

SQL> insert into    emp_hours values  ( 1 , 20 , 'REG' , 20 ) ;
1 row created.

SQL> insert into    emp_hours values( 1 , 20 , 'OT' , 4.5 ) ;
1 row created.

SQL> insert into    emp_hours values ( 2 , 20 , 'REG' , 40 ) ;
1 row created.

SQL> insert into    emp_hours values( 2 , 20 , 'REG' , 35 ) ;
1 row created.

SQL> insert into    emp_hours values ( 3 , 20 , 'OT' , 4.5 ) ;
1 row created.

SQL> commit;
Commit complete.

SQL> select empid , hrstype , sum(hrs) from emp_hours;
select empid , hrstype , sum(hrs) from emp_hours
       *
ERROR at line 1:
ORA-00937: not a single-group group function

SQL> select empid , hrstype , sum(hrs) from emp_hours group by empid , hrstype ;

     EMPID HRSTYPE      SUM(HRS)
---------- ---------- ----------
         3 OT                4.5
         1 REG                20
         2 REG                75
         1 OT                4.5

MS SQL Server


1> Create table emp_hours
2> ( empid int ,
3>   workweek int ,
4>   hrstype varchar(10) ,
5>   hrs numeric(5,2)
6>   ) ;
7> go
1> insert into    emp_hours values  ( 1 , 20 , 'REG' , 20 ) ;
2> insert into    emp_hours values( 1 , 20 , 'OT' , 4.5 ) ;
3> insert into    emp_hours values ( 2 , 20 , 'REG' , 40 ) ;
4> insert into    emp_hours values( 2 , 20 , 'REG' , 35 ) ;
5> insert into    emp_hours values ( 3 , 20 , 'OT' , 4.5 ) ;
6> go

(1 rows affected)

(1 rows affected)

(1 rows affected)

(1 rows affected)

(1 rows affected)  

1> select empid , hrstype , sum(hrs) from emp_hours;
2> go
Msg 8120, Level 16, State 1, Server AUSTIN02, Line 1
Column 'emp_hours.empid' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

1> select empid , hrstype , sum(hrs) from emp_hours group by empid , hrstype ;
2> go
empid hrstype
----- ------- -
1 OT 4.50
3 OT 4.50
1 REG 20.00
2 REG 75.00

(4 rows affected)       

PostgreSQL 

Similarly , we get the error as shown below in Postgres for the above test case.

ERROR: column "emp_hours.empid" must appear in the GROUP BY clause or be used in an aggregate function
SQL state: 42803

Character: 8

MySQL 

mysql> select version();
+-------------------------+
| version()               |
+-------------------------+
| 5.5.49-0ubuntu0.14.04.1 |
+-------------------------+
1 row in set (0.00 sec)



mysql> select empid , hrstype , sum(hrs) from emp_hours;
+-------+---------+----------+
| empid | hrstype | sum(hrs) |
+-------+---------+----------+
|     1 | REG     |   104.00 |
+-------+---------+----------+
1 row in set (0.00 sec)

Instead of a error , the above SQL summed up the hours and produced one record.  

Personally , I would prefer the above SQL to error out , instead of giving this answer. 

mysql> select empid , hrstype , sum(hrs) from emp_hours group by empid , hrstype ;
+-------+---------+----------+
| empid | hrstype | sum(hrs) |
+-------+---------+----------+
|     1 | OT      |     4.50 |
|     1 | REG     |    20.00 |
|     2 | REG     |    75.00 |
|     3 | OT      |     4.50 |
+-------+---------+----------+
4 rows in set (0.05 sec)


MariaDB

mysql> select version();
+--------------------------------------------+
| version()                                  |
+--------------------------------------------+
| 10.3.11-MariaDB-1:10.3.11+maria~bionic-log |
+--------------------------------------------+
1 row in set (0.00 sec)


mysql> select empid , hrstype , sum(hrs) from emp_hours;
+-------+---------+----------+
| empid | hrstype | sum(hrs) |
+-------+---------+----------+
|     1 | REG     |   104.00 |
+-------+---------+----------+
1 row in set (0.00 sec)


It is better , not to assume , the behavior of the underlying DBMS. 


Tuesday, January 24, 2017

Implementation difference between PostgreSQL and MS SQL Server in varchar with no length.




Today , I learned something new  in PostgreSQL. Surprised  to see the implementation difference between  PostgreSQL and MS SQL Server . 

In PostgreSQL and SQL Server , I can define a 'varchar' or 'char' datatype  with no width . In PostgreSQL , 'char' datatype with no length means char(1) , whereas 'varchar' datatype with no length means that it can accommodate any size  

Here is the example from PostgreSQL

 sales=# SELECT version();
                                                   version
 -----------------------------------------------------------------------------------------------------------
  PostgreSQL 9.3.9 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4, 64-bit
(1 row)


sales=# create table t( c char) ;
CREATE TABLE
 sales=# insert into t values ('AA');
ERROR:  value too long for type character(1)
sales=# insert into t values ('A');
INSERT 0 1
sales=# select * from t;
 c
---
 A
(1 row)


sales=# drop table t;
DROP TABLE
sales=#
sales=# create table t( c varchar) ;
CREATE TABLE
sales=# insert into t values ('AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA')
sales-# ;
INSERT 0 1
sales=# Select * from t;
                                           c
---------------------------------------------------------------------------------------
 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA


Here is the example from SQL Server.


C:\>sqlcmd -W
1> use test
2> go
Changed database context to 'test'.
1> Select @@version ;
2> go

-
Microsoft SQL Server 2014 (RTM-CU14) (KB3158271) - 12.0.2569.0 (X64)
        May 27 2016 15:06:08
        Copyright (c) Microsoft Corporation
        Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.3 (Build 14393: )


(1 rows affected)
1> drop table t;
2> go
1> create table t( c char) ;
2> go
1> insert into t values ('AA');
2> go
Msg 8152, Level 16, State 14, Server DOCS-0961, Line 1
String or binary data would be truncated.
The statement has been terminated.
1> insert into t values ('A');
2> go

(1 rows affected)
1> Select * from t;
2> go
c
-
A

(1 rows affected)
1> drop table t ;
2> create table t( c varchar) ;
3> go
1> insert into t values ('AA');
2> go
Msg 8152, Level 16, State 14, Server DOCS-0961, Line 1
String or binary data would be truncated.
The statement has been terminated.
1> insert into t values ('A');
2> go

(1 rows affected)
1> Select * from t;
2> go
c
-
A

(1 rows affected)

1>


As can be seen from the script above , PostgresQL and SQL Server behave the same way when a column is defined 'char' with no length. And it is so drastically different when it comes to 'varchar' with no length. 

In Oracle , this is not allowed . Any 'char' or 'varchar' has to have a length defined . 

Learning something new every day. 

Thursday, April 14, 2016

Accessing Postgresql / MySQL / MS SQL Server from Oracle.




Gone are the days , where a business has only one DBMS to maintain its operations.


Most of the times , you are dealing with multiple DBMS in your organization. 

The vendors provide some means of connecting to distributed databases . For instance ,
Oracle provides "database link"  ( via transparent gateway), MS SQL Server provides "Linked Server" mechanism to connect to heterogeneous databases and other data sources.

Here , we will look at connecting to PostgreSQL from Oracle by using transparent ODBC gateway .  This process is applicable any ODBC complaint data sources . 

Oracle does provides transparent gateways for MS SQL Server , DB2 . But those needs to be licensed.

As of today , ODBC transparent gateway is not separately licensed . Please check with your rep for the licensing details.

Here are the steps.

Step 1 :
Configure ODBC connection and a create a system DSN for PostgreSQL
Let us call it as "pgsales”

Note : This needs to be configured as "System DSN"  , not as User DSN.





Step2 :

Create a file under %ORACLE_HOME%\hs\admin in the form initpgsalesodbc.ora . The naming convention is “init” followed by  desired name . In my case , I have it as  initpgsalesodbc.ora. 



Set the HS_FDS_CONNECT_INFO to the DSN name that was configured in ODBC setup . 
I would recommend to keep the trace level to 'ON' for the initial setup . Once the setup is complete ; connectivity is established ; we can turn off the trace .  

Oracle provides sample file . We can use clone this file to suit our needs.  

Here are the contents of initpgsalesodbc.ora.

# This is a sample agent init file that contains the HS parameters that are
# needed for the Database Gateway for ODBC

#
# HS init parameters
#
HS_FDS_CONNECT_INFO = pgsales
HS_FDS_TRACE_LEVEL =  ON


#
# Environment variables required for the non-Oracle system
#
#set =


Step 3 :


Add a entry in listener.ora to register this data source.   The SID name must match with the init file created in the above step.



      (SID_DESC =
              (SID_NAME = pgsalesodbc)
              (ORACLE_HOME = c:\app\oracle\product\12.1.0\dbhome_1)
              (PROGRAM = dg4odbc)     
    )

Step 4: 

Add entry in tnsnames.ora . The SID  in connect_data should match SID defined in the listerner.ora and the file in hs\admin. 

pgsqllink  =
  (DESCRIPTION=
 (enable=broken)
    (ADDRESS=(PROTOCOL=tcp)(HOST=ZAHIRSER)(PORT=1521))
    (CONNECT_DATA=(SID= pgsalesodbc))
    (HS=OK)
  )


Step 5:

Restart the listener.You should see to make sure that the listener is servicing the services we created.


Service "pgsalesodbc" has 1 instance(s).
  Instance "pgsalesodbc", status UNKNOWN, has 1 handler(s) for this service...


Step 6: 

Create a database link .  
In the following example , LINK2PGSQL is our db link , 'zahir' is the user id in the postgresql database and "passwd1234" is the password . 
 

SQL> CREATE DATABASE LINK LINK2PGSQL   CONNECT TO "zahir" IDENTIFIED BY "passwd1234" USING 'pgsqllink'  ;

 Database link created.

 
To test the connectivity , issue a select statement as below . Please note , for Postgresql databases  , you will need to enclose the table/view names in quotes.

 
SQL> select * from "t"@LINK2PGSQL;

c
--
A
B
C


With this feature , we can manipulate the data stored in heterogeneous databases . 

Comments Welcome .