Wednesday, January 28, 2015

Playing with the Optimizer Statistics.



Testing query performance in the development environment can be challenging  at times. In the dev environment , you may / will not have all the records in the table to simulate the production scenario. 

Ideally , performance tuning has to start from the inception of the development ; not as afterthought. 

The easier way is  that to tell / fake  the optimizer that you have more records  and experiment with the execution plans.  

ORACLE
======

In Oracle , you can do this by using "set_table_stats" procedure in dbms_stats. 

Let us create a table "t" with 10 records and look at the plan to make sure it gets 10 records. 



SQL> drop table t ;

Table dropped.

SQL> Create table t as select * from all_objects where rownum <= 10 ;

Table created.

SQL> select count(*) from t ;

  COUNT(*)
----------
        10

SQL> exec dbms_stats.gather_table_stats(user , 'T') ;

PL/SQL procedure successfully completed.

SQL>  explain plan for select * from t;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |  1000 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    |    10 |  1000 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------


Now , let us tell the optmizer that this table has 9876540 records and look at the plan . 
As shown below , the estimated records is 9876K. 


SQL> exec dbms_stats.set_table_stats(ownname =>user , tabname => 'T' , numrows=>9876540) ;

PL/SQL procedure successfully completed.

SQL> explain plan for select * from t;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  9876K|   941M|   129  (98)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    |  9876K|   941M|   129  (98)| 00:00:01 |
--------------------------------------------------------------------------

To set it back to the original , we can gather table statistics by using dbms_stats.gather_table_stats


SQL SERVER
==========


In SQL Server , we can use 'update statistics'  with rowcount / page count. 

Let us create a table with 10 records and get the plan. 


C:\>sqlcmd -W
1> use testdb
2> go
Changed database context to 'testdb'.
1> Drop table t;
2> go
1> Select top 10 * into t from INFORMATION_SCHEMA.TABLES ;
2> go

(10 rows affected)



Here is the execution plan for the newly created table. As you can see the from the plan , the optimizer thinks that table has 10 records. 


Now , let us fake the record count with the command below and look at the execution plan again . 

1> update statistics t
2> with rowcount = 9876540 , pagecount = 587456;
3> go





To set it back to the original record count , we can run the following . 

1> dbcc updateusage(testdb , 't' ) with count_rows ;
2> go
DBCC UPDATEUSAGE: Usage counts updated for table 't' (index 't', partition 1):
        DATA pages (In-row Data): changed from (587456) to (2) pages.
        ROWS count: changed from (9876540) to (10) rows.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.


Comments welcome. 


Monday, January 12, 2015

APPROX_COUNT_DISTINCT - New Oracle 12c Function



Oracle 12c has introduced a aggregate function 'APPROX_COUNT_DISTINCT'  that produces approximate count of discount records. 

If you can live with the approximate count , then this will be better choice if the underlying dataset is large. 

In my test database , I have table called 'whlog' that has around 4 million records .
When I used this funcion , Oracle was able to use the index and comeup with the result a lot quickly . 

The difference between this function and the traditional ( count of discount  ) was pretty significant . 

As always , your mileage will vary. 

Here are the examples:
----------------------

SQL> select count(*) from whlog;

  COUNT(*)
----------
   4244970


SQL> Select count(distinct member_id) from  whlog ;

COUNT(DISTINCTMEMBER_ID)
------------------------
                   48090

Elapsed: 00:00:04.62


SQL>  SELECT  APPROX_COUNT_DISTINCT(member_id) from whlog ;

APPROX_COUNT_DISTINCT(MEMBER_ID)
--------------------------------
                           48057

Elapsed: 00:00:00.23