ZAHIR MOHIDEEN: January 2015

Wednesday, January 28, 2015

Playing with the Optimizer Statistics.

Testing query performance in the development environment can be challenging at times. In the dev environment , you may / will not have all the records in the table to simulate the production scenario.

Ideally , performance tuning has to start from the inception of the development ; not as afterthought.

The easier way is that to tell / fake the optimizer that you have more records and experiment with the execution plans.

ORACLE
======

In Oracle , you can do this by using "set_table_stats" procedure in dbms_stats.

Let us create a table "t" with 10 records and look at the plan to make sure it gets 10 records.

SQL> drop table t ;

Table dropped.

SQL> Create table t as select * from all_objects where rownum <= 10 ;

Table created.

SQL> select count(*) from t ;

COUNT(*)
----------
10

SQL> exec dbms_stats.gather_table_stats(user , 'T') ;

PL/SQL procedure successfully completed.

SQL> explain plan for select * from t;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 1000 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T | 10 | 1000 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------

Now , let us tell the optmizer that this table has 9876540 records and look at the plan .
As shown below , the estimated records is 9876K.

SQL> exec dbms_stats.set_table_stats(ownname =>user , tabname => 'T' , numrows=>9876540) ;

PL/SQL procedure successfully completed.

SQL> explain plan for select * from t;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9876K| 941M| 129 (98)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T | 9876K| 941M| 129 (98)| 00:00:01 |
--------------------------------------------------------------------------

To set it back to the original , we can gather table statistics by using dbms_stats.gather_table_stats

SQL SERVER
==========

In SQL Server , we can use 'update statistics' with rowcount / page count.

Let us create a table with 10 records and get the plan.

C:\>sqlcmd -W
1> use testdb
2> go
Changed database context to 'testdb'.
1> Drop table t;
2> go
1> Select top 10 * into t from INFORMATION_SCHEMA.TABLES ;
2> go

(10 rows affected)

Here is the execution plan for the newly created table. As you can see the from the plan , the optimizer thinks that table has 10 records.

Now , let us fake the record count with the command below and look at the execution plan again .

1> update statistics t

2> with rowcount = 9876540 , pagecount = 587456;

3> go

To set it back to the original record count , we can run the following .

1> dbcc updateusage(testdb , 't' ) with count_rows ;

2> go

DBCC UPDATEUSAGE: Usage counts updated for table 't' (index 't', partition 1):

DATA pages (In-row Data): changed from (587456) to (2) pages.

ROWS count: changed from (9876540) to (10) rows.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Comments welcome.

Monday, January 12, 2015

APPROX_COUNT_DISTINCT - New Oracle 12c Function

Oracle 12c has introduced a aggregate function 'APPROX_COUNT_DISTINCT' that produces approximate count of discount records.

If you can live with the approximate count , then this will be better choice if the underlying dataset is large.

In my test database , I have table called 'whlog' that has around 4 million records .
When I used this funcion , Oracle was able to use the index and comeup with the result a lot quickly .

The difference between this function and the traditional ( count of discount ) was pretty significant .

As always , your mileage will vary.

Here are the examples:
----------------------

SQL> select count(*) from whlog;

COUNT(*)
----------
4244970

SQL> Select count(distinct member_id) from whlog ;

COUNT(DISTINCTMEMBER_ID)
------------------------
48090

Elapsed: 00:00:04.62

SQL> SELECT APPROX_COUNT_DISTINCT(member_id) from whlog ;

APPROX_COUNT_DISTINCT(MEMBER_ID)
--------------------------------
48057

Elapsed: 00:00:00.23