Oracle 11g introduced "extended statistics " , by which we can create statistics on combination on columns that you usually query with . In theory , by generating extended statistics , optimizer will generate better plan , with the correct selectivity .
Let us look at an example .
SQL> desc course_ext_stats
Name Null? Type
---- ------- --------
ID NUMBER
COURSE_YEAR NOT NULL NUMBER
COURSE_TERM NOT NULL NUMBER
COURSE_NO NOT NULL VARCHAR2(5)
SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS');
PL/SQL procedure successfully completed.
SQL> select column_name , num_distinct , density , sample_size , histogram from user_tab_col_statistics where table_name ='COURSE_EXT_STATS';
COLUMN_NAME NUM_DISTINCT DENSITY SAMPLE_SIZE HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------------
ID 432879 2.3101E-06 432879 NONE
COURSE_YEAR 16 .0625 432879 NONE
COURSE_TERM 4 .25 432879 NONE
COURSE_NO 82656 .000012098 432879 NONE
SQL> set autotrace on
SQL> select * from course_ext_stats where course_year = 2011 and course_term = 5 ;
.....
....
....
3568 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 185418477
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6764 | 118K| 390 (3)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| COURSE_EXT_STATS | 6764 | 118K| 390 (3)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COURSE_YEAR"=2011 AND "COURSE_TERM"=5)
Note
-----
- automatic DOP: skipped because of IO calibrate statistics are missing
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1615 consistent gets
0 physical reads
0 redo size
77978 bytes sent via SQL*Net to client
2971 bytes received via SQL*Net from client
239 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3568 rows processed
By looking at the plan , the optimizer estimated 6764 rows , in fact , the query fetched only 3568 rows.
Let us create extended statistics on these columns ( course_year and course_term ) .
While creating extended statistics , Oracle created another column as shown above. The optimizer gives relatively better number.
SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS' , method_opt=>'for all columns size 1' );
PL/SQL procedure successfully completed.
SQL>select * from course_ext_stats where course_year = 2011 and course_term = 5 ;
.....
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3928 | 70704 | 390 (3)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| COURSE_EXT_STATS | 3928 | 70704 | 390 (3)| 00:00:01 |
--------------------------------------------------------------------------------------
Now , let us generate histograms up to 254 for all columns and see the output.
SQL> exec dbms_stats.delete_table_stats(user , 'COURSE_EXT_STATS');
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.drop_extended_stats(user , 'COURSE_EXT_STATS' , '(COURSE_YEAR, COURSE_TERM)') ;
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS' , method_opt=>'for all columns size 254' );
PL/SQL procedure successfully completed.
SQL> select column_name , num_distinct , density , sample_size , histogram , num_buckets from user_tab_col_statistics where table_name ='COURSE_EXT_STATS';
COLUMN_NAME NUM_DISTINCT DENSITY SAMPLE_SIZE HISTOGRAM NUM_BUCKETS
------------------------------ ------------ ---------- ----------- --------------- -----------
ID 432879 2.3186E-06 5480 HEIGHT BALANCED 254
COURSE_YEAR 16 1.1593E-06 5480 FREQUENCY 16
COURSE_TERM 4 1.1593E-06 5480 FREQUENCY 4
COURSE_NO 82656 .000026582 5480 HEIGHT BALANCED 254
Execution Plan
----------------------------------------------------------
Plan hash value: 185418477
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3264 | 58752 | 390 (3)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| COURSE_EXT_STATS | 3264 | 58752 | 390 (3)| 00:00:01 |
--------------------------------------------------------------------------------------
Now , the optimizer estimated yet another number for expected rows.
So , which one is better ?
I guess , the answer is "it depends" :-)
Also , I would need to poke around a bit.
Let us look at an example .
SQL> desc course_ext_stats
Name Null? Type
---- ------- --------
ID NUMBER
COURSE_YEAR NOT NULL NUMBER
COURSE_TERM NOT NULL NUMBER
COURSE_NO NOT NULL VARCHAR2(5)
SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS');
PL/SQL procedure successfully completed.
SQL> select column_name , num_distinct , density , sample_size , histogram from user_tab_col_statistics where table_name ='COURSE_EXT_STATS';
COLUMN_NAME NUM_DISTINCT DENSITY SAMPLE_SIZE HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------------
ID 432879 2.3101E-06 432879 NONE
COURSE_YEAR 16 .0625 432879 NONE
COURSE_TERM 4 .25 432879 NONE
COURSE_NO 82656 .000012098 432879 NONE
SQL> set autotrace on
SQL> select * from course_ext_stats where course_year = 2011 and course_term = 5 ;
.....
....
....
3568 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 185418477
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6764 | 118K| 390 (3)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| COURSE_EXT_STATS | 6764 | 118K| 390 (3)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COURSE_YEAR"=2011 AND "COURSE_TERM"=5)
Note
-----
- automatic DOP: skipped because of IO calibrate statistics are missing
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1615 consistent gets
0 physical reads
0 redo size
77978 bytes sent via SQL*Net to client
2971 bytes received via SQL*Net from client
239 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3568 rows processed
By looking at the plan , the optimizer estimated 6764 rows , in fact , the query fetched only 3568 rows.
Let us create extended statistics on these columns ( course_year and course_term ) .
SQL> Select dbms_stats.create_extended_stats(user , 'COURSE_EXT_STATS' , '(COURSE_YEAR, COURSE_TERM)') from dual ;
DBMS_STATS.CREATE_EXTENDED_STATS(USER,'COURSE_EXT_STATS','(COURSE_YEAR,COURSE_TERM)')
----------------------------------------------------------------------------------------------------------------------
SYS_STUNAZ4NUJOSOZ98MCGXG90RF7
DBMS_STATS.CREATE_EXTENDED_STATS(USER,'COURSE_EXT_STATS','(COURSE_YEAR,COURSE_TERM)')
----------------------------------------------------------------------------------------------------------------------
SYS_STUNAZ4NUJOSOZ98MCGXG90RF7
While creating extended statistics , Oracle created another column as shown above. The optimizer gives relatively better number.
SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS' , method_opt=>'for all columns size 1' );
PL/SQL procedure successfully completed.
SQL>select * from course_ext_stats where course_year = 2011 and course_term = 5 ;
.....
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3928 | 70704 | 390 (3)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| COURSE_EXT_STATS | 3928 | 70704 | 390 (3)| 00:00:01 |
--------------------------------------------------------------------------------------
Now , let us generate histograms up to 254 for all columns and see the output.
SQL> exec dbms_stats.delete_table_stats(user , 'COURSE_EXT_STATS');
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.drop_extended_stats(user , 'COURSE_EXT_STATS' , '(COURSE_YEAR, COURSE_TERM)') ;
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS' , method_opt=>'for all columns size 254' );
PL/SQL procedure successfully completed.
SQL> select column_name , num_distinct , density , sample_size , histogram , num_buckets from user_tab_col_statistics where table_name ='COURSE_EXT_STATS';
COLUMN_NAME NUM_DISTINCT DENSITY SAMPLE_SIZE HISTOGRAM NUM_BUCKETS
------------------------------ ------------ ---------- ----------- --------------- -----------
ID 432879 2.3186E-06 5480 HEIGHT BALANCED 254
COURSE_YEAR 16 1.1593E-06 5480 FREQUENCY 16
COURSE_TERM 4 1.1593E-06 5480 FREQUENCY 4
COURSE_NO 82656 .000026582 5480 HEIGHT BALANCED 254
Execution Plan
----------------------------------------------------------
Plan hash value: 185418477
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3264 | 58752 | 390 (3)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| COURSE_EXT_STATS | 3264 | 58752 | 390 (3)| 00:00:01 |
--------------------------------------------------------------------------------------
Now , the optimizer estimated yet another number for expected rows.
So , which one is better ?
I guess , the answer is "it depends" :-)
Also , I would need to poke around a bit.