ZAHIR MOHIDEEN: August 2011

Oracle 11g introduced "extended statistics " , by which we can create statistics on combination on columns that you usually query with . In theory , by generating extended statistics , optimizer will generate better plan , with the correct selectivity .

Let us look at an example .

SQL> desc course_ext_stats
Name                                      Null?    Type
----                                      ------- --------
ID                                                 NUMBER
COURSE_YEAR                               NOT NULL NUMBER
COURSE_TERM                               NOT NULL NUMBER
COURSE_NO                                 NOT NULL VARCHAR2(5)

SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS');
PL/SQL procedure successfully completed.

SQL> select column_name , num_distinct , density , sample_size , histogram from user_tab_col_statistics where table_name ='COURSE_EXT_STATS';

COLUMN_NAME                    NUM_DISTINCT    DENSITY SAMPLE_SIZE HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------------
ID                                   432879 2.3101E-06      432879 NONE
COURSE_YEAR                              16      .0625      432879 NONE
COURSE_TERM                               4        .25      432879 NONE
COURSE_NO                             82656 .000012098      432879 NONE

SQL> set autotrace on
SQL> select * from course_ext_stats where course_year = 2011 and course_term = 5 ;

.....
....
....

3568 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 185418477

--------------------------------------------------------------------------------------
| Id | Operation         | Name             | Rows | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  | 6764 |   118K|   390   (3)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| COURSE_EXT_STATS | 6764 |   118K|   390   (3)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COURSE_YEAR"=2011 AND "COURSE_TERM"=5)

Note
-----
   - automatic DOP: skipped because of IO calibrate statistics are missing

Statistics
----------------------------------------------------------
          1 recursive calls
          0 db block gets
       1615 consistent gets
          0 physical reads
          0 redo size
      77978 bytes sent via SQL*Net to client
       2971 bytes received via SQL*Net from client
        239 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
       3568 rows processed

By looking at the plan , the optimizer estimated 6764 rows , in fact , the query fetched only 3568 rows.

Let us create extended statistics on these columns ( course_year and course_term ) .

SQL> Select  dbms_stats.create_extended_stats(user , 'COURSE_EXT_STATS' , '(COURSE_YEAR, COURSE_TERM)')  from dual ;

DBMS_STATS.CREATE_EXTENDED_STATS(USER,'COURSE_EXT_STATS','(COURSE_YEAR,COURSE_TERM)')

----------------------------------------------------------------------------------------------------------------------

SYS_STUNAZ4NUJOSOZ98MCGXG90RF7

While creating extended statistics , Oracle created another column as shown above. The optimizer gives relatively better number.

SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS' , method_opt=>'for all columns size 1' );
PL/SQL procedure successfully completed.

SQL>select * from course_ext_stats where course_year = 2011 and course_term = 5 ;
.....

--------------------------------------------------------------------------------------
| Id | Operation         | Name             | Rows | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  | 3928 | 70704 |   390   (3)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| COURSE_EXT_STATS | 3928 | 70704 |   390   (3)| 00:00:01 |
--------------------------------------------------------------------------------------

Now , let us generate histograms up to 254 for all columns and see the output.

SQL> exec dbms_stats.delete_table_stats(user , 'COURSE_EXT_STATS');
PL/SQL procedure successfully completed.

SQL> exec dbms_stats.drop_extended_stats(user , 'COURSE_EXT_STATS' , '(COURSE_YEAR, COURSE_TERM)') ;
PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS' , method_opt=>'for all columns size 254' );
PL/SQL procedure successfully completed.

SQL> select column_name , num_distinct , density , sample_size , histogram , num_buckets from user_tab_col_statistics where table_name ='COURSE_EXT_STATS';

COLUMN_NAME                    NUM_DISTINCT    DENSITY SAMPLE_SIZE HISTOGRAM       NUM_BUCKETS
------------------------------ ------------ ---------- ----------- --------------- -----------
ID                                   432879 2.3186E-06        5480 HEIGHT BALANCED         254
COURSE_YEAR                              16 1.1593E-06        5480 FREQUENCY                16
COURSE_TERM                               4 1.1593E-06        5480 FREQUENCY                 4
COURSE_NO                             82656 .000026582        5480 HEIGHT BALANCED         254

Execution Plan
----------------------------------------------------------
Plan hash value: 185418477

--------------------------------------------------------------------------------------
| Id | Operation         | Name             | Rows | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  | 3264 | 58752 |   390   (3)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| COURSE_EXT_STATS | 3264 | 58752 |   390   (3)| 00:00:01 |
--------------------------------------------------------------------------------------

Now , the optimizer estimated yet another number for expected rows.
So , which one is better ?

I guess , the answer is "it depends" :-)

Also , I would need to poke around a bit.

ZAHIR MOHIDEEN

Monday, August 22, 2011

Histograms Vs Extended Statistics