Wednesday, December 27, 2017

R Dataframe in the eyes of SQL developer.


R has gained lot of momentum in the last few years  for Data Science. At first , for a SQL professional , this may be bit daunting ; however there are lot of similarities  between the RDBMS concepts and R concepts , that will make the learning curve tad easier. One of the similarity is the data frame. 


Data frame is one of the important component in R to capture the data from the external data sources ( aka importing from CSV , loading from RDMS  , and so on ) . 

It is conceptually same as the a table in a RDBMS system.

In the following , I have created a data frame with  3 elements and 6 rows. 
In RDBMS , this is the same as creating a table 'emp' and inserting 6 records. 

emp <- data.frame="" span="">
  name=c("Zahir","Farook","Hameed","Basheer","Aslam","Suhaib"),
  deptno=c(10,20,30,30,20,20),
 city=c("Monroe","Trichy","Kilakarai","Kilakarai","Chennai","Chennai"))  
















When  the data frame is referenced at the prompt , it returns the entire data set. This is similar to "SELECT * FROM EMP",




The function "rbind" is used to insert a record into the existing dataset. 
This is similar to "INSERT INTO EMP values ('Karady' , 100 , 'Colombo') "

emp <- arady="" data.frame="" deptno="c(100),city=c(" emp="" name="c(" olombo="" rbind="" span="">

 

The function "nrow" is used to get the record count of the dataset.
This is similar to " SELECT count(*) from EMP". 


The function "ncol" is used to get the record count of the columns.
This is similar to " SELECT count(*) from information_schema.columns where table_name =EMP'" .


 With the following example , we are filtering the records that have deptno = 30 . This is similar to
"SELECT *  FROM EMP WHERE DEPTNO= 30'.



 We can add , additional filter with the pipe function . Pipe is used for 'OR' condition. 
This is similar to "SELECT *  FROM EMP WHERE DEPTNO= 30 ORCITY ='Chennai'.





As we can see , there  are lot of similarites in with the concept of Table (tuple) and the dataframe. 
This could be a starting point to get familiar with R  for a SQL professional . 

I understand , I have just scratched the surface on the data frame and its functions.

As of now , Oracle and MS SQL Server has incoprated 'R' into their offerings.

Comments Welcome.