Showing posts with label execution plan. Show all posts
Showing posts with label execution plan. Show all posts

Wednesday, January 28, 2015

Playing with the Optimizer Statistics.



Testing query performance in the development environment can be challenging  at times. In the dev environment , you may / will not have all the records in the table to simulate the production scenario. 

Ideally , performance tuning has to start from the inception of the development ; not as afterthought. 

The easier way is  that to tell / fake  the optimizer that you have more records  and experiment with the execution plans.  

ORACLE
======

In Oracle , you can do this by using "set_table_stats" procedure in dbms_stats. 

Let us create a table "t" with 10 records and look at the plan to make sure it gets 10 records. 



SQL> drop table t ;

Table dropped.

SQL> Create table t as select * from all_objects where rownum <= 10 ;

Table created.

SQL> select count(*) from t ;

  COUNT(*)
----------
        10

SQL> exec dbms_stats.gather_table_stats(user , 'T') ;

PL/SQL procedure successfully completed.

SQL>  explain plan for select * from t;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |  1000 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    |    10 |  1000 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------


Now , let us tell the optmizer that this table has 9876540 records and look at the plan . 
As shown below , the estimated records is 9876K. 


SQL> exec dbms_stats.set_table_stats(ownname =>user , tabname => 'T' , numrows=>9876540) ;

PL/SQL procedure successfully completed.

SQL> explain plan for select * from t;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  9876K|   941M|   129  (98)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T    |  9876K|   941M|   129  (98)| 00:00:01 |
--------------------------------------------------------------------------

To set it back to the original , we can gather table statistics by using dbms_stats.gather_table_stats


SQL SERVER
==========


In SQL Server , we can use 'update statistics'  with rowcount / page count. 

Let us create a table with 10 records and get the plan. 


C:\>sqlcmd -W
1> use testdb
2> go
Changed database context to 'testdb'.
1> Drop table t;
2> go
1> Select top 10 * into t from INFORMATION_SCHEMA.TABLES ;
2> go

(10 rows affected)



Here is the execution plan for the newly created table. As you can see the from the plan , the optimizer thinks that table has 10 records. 


Now , let us fake the record count with the command below and look at the execution plan again . 

1> update statistics t
2> with rowcount = 9876540 , pagecount = 587456;
3> go





To set it back to the original record count , we can run the following . 

1> dbcc updateusage(testdb , 't' ) with count_rows ;
2> go
DBCC UPDATEUSAGE: Usage counts updated for table 't' (index 't', partition 1):
        DATA pages (In-row Data): changed from (587456) to (2) pages.
        ROWS count: changed from (9876540) to (10) rows.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.


Comments welcome. 


Monday, August 22, 2011

Histograms Vs Extended Statistics

Oracle 11g introduced "extended statistics "  , by which we can create statistics on combination on columns  that you usually query with .  In theory , by generating extended statistics , optimizer  will generate better plan , with the correct selectivity .


Let us look at an example .


SQL> desc course_ext_stats
 Name                                      Null?    Type
 ----                                      -------  --------
 ID                                                 NUMBER
 COURSE_YEAR                               NOT NULL NUMBER
 COURSE_TERM                               NOT NULL NUMBER
 COURSE_NO                                 NOT NULL VARCHAR2(5)




SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS');
PL/SQL procedure successfully completed.


SQL> select column_name , num_distinct , density , sample_size , histogram from user_tab_col_statistics where table_name ='COURSE_EXT_STATS';

COLUMN_NAME                    NUM_DISTINCT    DENSITY SAMPLE_SIZE HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------------
ID                                   432879 2.3101E-06      432879 NONE
COURSE_YEAR                              16      .0625      432879 NONE
COURSE_TERM                               4        .25      432879 NONE
COURSE_NO                             82656 .000012098      432879 NONE


SQL> set autotrace on
SQL> select * from course_ext_stats where course_year = 2011 and course_term = 5 ;

.....
....
....

3568 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 185418477

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  6764 |   118K|   390   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| COURSE_EXT_STATS |  6764 |   118K|   390   (3)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COURSE_YEAR"=2011 AND "COURSE_TERM"=5)

Note
-----
   - automatic DOP: skipped because of IO calibrate statistics are missing


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1615  consistent gets
          0  physical reads
          0  redo size
      77978  bytes sent via SQL*Net to client
       2971  bytes received via SQL*Net from client
        239  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       3568  rows processed
 

By looking at the plan , the optimizer estimated 6764 rows , in fact ,  the query fetched only 3568 rows.       


Let us create extended statistics on these columns (  course_year and course_term ) .

SQL> Select  dbms_stats.create_extended_stats(user , 'COURSE_EXT_STATS' , '(COURSE_YEAR, COURSE_TERM)')  from dual ;

DBMS_STATS.CREATE_EXTENDED_STATS(USER,'COURSE_EXT_STATS','(COURSE_YEAR,COURSE_TERM)')
----------------------------------------------------------------------------------------------------------------------
SYS_STUNAZ4NUJOSOZ98MCGXG90RF7      

While  creating extended statistics , Oracle created another column as shown above.  The optimizer gives relatively better number.

SQL>  exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS' , method_opt=>'for all columns size 1' );
PL/SQL procedure successfully completed.

SQL>select * from course_ext_stats where course_year = 2011 and course_term = 5 ;
.....


--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  3928 | 70704 |   390   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| COURSE_EXT_STATS |  3928 | 70704 |   390   (3)| 00:00:01 |
--------------------------------------------------------------------------------------



Now , let us generate histograms up to 254 for all columns and see the output.

SQL> exec dbms_stats.delete_table_stats(user , 'COURSE_EXT_STATS');
PL/SQL procedure successfully completed.


SQL> exec dbms_stats.drop_extended_stats(user , 'COURSE_EXT_STATS' , '(COURSE_YEAR, COURSE_TERM)') ;
PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS' , method_opt=>'for all columns size 254' );
PL/SQL procedure successfully completed.


SQL> select column_name , num_distinct , density , sample_size , histogram , num_buckets from user_tab_col_statistics where table_name ='COURSE_EXT_STATS';

COLUMN_NAME                    NUM_DISTINCT    DENSITY SAMPLE_SIZE HISTOGRAM       NUM_BUCKETS
------------------------------ ------------ ---------- ----------- --------------- -----------
ID                                   432879 2.3186E-06        5480 HEIGHT BALANCED         254
COURSE_YEAR                              16 1.1593E-06        5480 FREQUENCY                16
COURSE_TERM                               4 1.1593E-06        5480 FREQUENCY                 4
COURSE_NO                             82656 .000026582        5480 HEIGHT BALANCED         254



Execution Plan
----------------------------------------------------------
Plan hash value: 185418477

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  3264 | 58752 |   390   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| COURSE_EXT_STATS |  3264 | 58752 |   390   (3)| 00:00:01 |
--------------------------------------------------------------------------------------
 

Now , the optimizer estimated yet another number for expected rows.
So , which one  is better ?


I guess ,  the answer is "it depends" :-) 


Also , I would need to poke around a bit.