Showing posts with label Histograms. Show all posts
Showing posts with label Histograms. Show all posts

Tuesday, July 16, 2013

Auto Create Statistics in Oracle 12c.

Exploring Oracle 12c .... 

Prior to Oracle 12c , we usually generate the table statistics  to give it to the Optimizer for generating cost effective plans.
 

In Oracle 12c  , this is done automatically.

Please see the example below .


Oracle 12c
===========

SQL> col product format a50
SQL> col version  format a15

SQL> Select * from product_component_version ;

PRODUCT                                            VERSION         STATUS
-------------------------------------------------- --------------- ---------------------------
NLSRTL                                             12.1.0.1.0      Production
Oracle Database 12c Enterprise Edition             12.1.0.1.0      64bit Production
PL/SQL                                             12.1.0.1.0      Production
TNS for 64-bit Windows:                            12.1.0.1.0      Production

SQL>

SQL> Create table t as Select * from all_objects;

Table created.


SQL> Create table t as Select * from all_objects;

Table created.

SQL> Select table_name , num_rows , last_analyzed from user_tab_statistics where table_name ='T';

TABLE   NUM_ROWS LAST_ANAL
----- ---------- ---------
T          73760 16-JUL-13


SQL> set pagesize 90
SQL> Select table_name , column_name , num_distinct ,    num_buckets ,  histogram , last_analyzed
  2  from user_tab_columns where table_name  ='T';

TABLE COLUMN_NAME          NUM_DISTINCT NUM_BUCKETS HISTOGRAM       LAST_ANAL
----- -------------------- ------------ ----------- --------------- ---------
T     ORACLE_MAINTAINED               2           1 NONE            16-JUL-13
T     EDITIONABLE                     2           1 NONE            16-JUL-13
T     SHARING                         3           1 NONE            16-JUL-13
T     EDITION_NAME                    0           0 NONE            16-JUL-13
T     NAMESPACE                       7           1 NONE            16-JUL-13
T     SECONDARY                       1           1 NONE            16-JUL-13
T     GENERATED                       2           1 NONE            16-JUL-13
T     TEMPORARY                       2           1 NONE            16-JUL-13
T     STATUS                          1           1 NONE            16-JUL-13
T     TIMESTAMP                    1133           1 NONE            16-JUL-13
T     LAST_DDL_TIME                1037           1 NONE            16-JUL-13
T     CREATED                      1061           1 NONE            16-JUL-13
T     OBJECT_TYPE                    23           1 NONE            16-JUL-13
T     DATA_OBJECT_ID                 86           1 NONE            16-JUL-13
T     OBJECT_ID                   73760           1 NONE            16-JUL-13
T     SUBOBJECT_NAME                  0           0 NONE            16-JUL-13
T     OBJECT_NAME                 40132           1 NONE            16-JUL-13
T     OWNER                          17           1 NONE            16-JUL-13

18 rows selected.




Oracle 11g
==========

SQL> set linesize 1000
SQL> set pagesize 90
SQL> col product format a50
SQL> col version  format a15
SQL> Select * from product_component_version ;

PRODUCT                                            VERSION         STATUS
-------------------------------------------------- --------------- ------------------
NLSRTL                                             11.2.0.3.0      Production
Oracle Database 11g Enterprise Edition             11.2.0.3.0      64bit Production
PL/SQL                                             11.2.0.3.0      Production
TNS for 64-bit Windows:                            11.2.0.3.0      Production

SQL> Create table t as Select * from all_objects;

Table created.


SQL> Select table_name , num_rows , last_analyzed from user_tab_statistics where table_name ='T';

TABLE_NAME                       NUM_ROWS LAST_ANAL
------------------------------ ---------- ---------
T

SQL> col table_name format a5
SQL> col column_name format a20
SQL> Select table_name , column_name , num_distinct ,    num_buckets ,  histogram , last_analyzed
  2  from user_tab_columns where table_name  ='T';

TABLE COLUMN_NAME          NUM_DISTINCT NUM_BUCKETS HISTOGRAM       LAST_ANAL
----- -------------------- ------------ ----------- --------------- ---------
T     OWNER                                         NONE
T     OBJECT_NAME                                   NONE
T     SUBOBJECT_NAME                                NONE
T     OBJECT_ID                                     NONE
T     DATA_OBJECT_ID                                NONE
T     OBJECT_TYPE                                   NONE
T     CREATED                                       NONE
T     LAST_DDL_TIME                                 NONE
T     TIMESTAMP                                     NONE
T     STATUS                                        NONE
T     TEMPORARY                                     NONE
T     GENERATED                                     NONE
T     SECONDARY                                     NONE
T     NAMESPACE                                     NONE
T     EDITION_NAME                                  NONE

15 rows selected.

SQL> exec dbms_stats.gather_table_stats ( user , 'T' );

PL/SQL procedure successfully completed.

SQL> Select table_name , num_rows , last_analyzed from user_tab_statistics where table_name ='T';

TABLE   NUM_ROWS LAST_ANAL
----- ---------- ---------
T          56358 16-JUL-13

SQL> Select table_name , column_name , num_distinct ,    num_buckets ,  histogram , last_analyzed
  2  from user_tab_columns where table_name  ='T';

TABLE COLUMN_NAME          NUM_DISTINCT NUM_BUCKETS HISTOGRAM       LAST_ANAL
----- -------------------- ------------ ----------- --------------- ---------
T     OWNER                          16           1 NONE            16-JUL-13
T     OBJECT_NAME                 31094           1 NONE            16-JUL-13
T     SUBOBJECT_NAME                 12           1 NONE            16-JUL-13
T     OBJECT_ID                   56358           1 NONE            16-JUL-13
T     DATA_OBJECT_ID                 92           1 NONE            16-JUL-13
T     OBJECT_TYPE                    29           1 NONE            16-JUL-13
T     CREATED                      1122           1 NONE            16-JUL-13
T     LAST_DDL_TIME                1459           1 NONE            16-JUL-13
T     TIMESTAMP                    1429           1 NONE            16-JUL-13
T     STATUS                          2           1 NONE            16-JUL-13
T     TEMPORARY                       2           1 NONE            16-JUL-13
T     GENERATED                       2           1 NONE            16-JUL-13
T     SECONDARY                       1           1 NONE            16-JUL-13
T     NAMESPACE                      10           1 NONE            16-JUL-13
T     EDITION_NAME                    0           0 NONE            16-JUL-13

15 rows selected.

Monday, August 22, 2011

Histograms Vs Extended Statistics

Oracle 11g introduced "extended statistics "  , by which we can create statistics on combination on columns  that you usually query with .  In theory , by generating extended statistics , optimizer  will generate better plan , with the correct selectivity .


Let us look at an example .


SQL> desc course_ext_stats
 Name                                      Null?    Type
 ----                                      -------  --------
 ID                                                 NUMBER
 COURSE_YEAR                               NOT NULL NUMBER
 COURSE_TERM                               NOT NULL NUMBER
 COURSE_NO                                 NOT NULL VARCHAR2(5)




SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS');
PL/SQL procedure successfully completed.


SQL> select column_name , num_distinct , density , sample_size , histogram from user_tab_col_statistics where table_name ='COURSE_EXT_STATS';

COLUMN_NAME                    NUM_DISTINCT    DENSITY SAMPLE_SIZE HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------------
ID                                   432879 2.3101E-06      432879 NONE
COURSE_YEAR                              16      .0625      432879 NONE
COURSE_TERM                               4        .25      432879 NONE
COURSE_NO                             82656 .000012098      432879 NONE


SQL> set autotrace on
SQL> select * from course_ext_stats where course_year = 2011 and course_term = 5 ;

.....
....
....

3568 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 185418477

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  6764 |   118K|   390   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| COURSE_EXT_STATS |  6764 |   118K|   390   (3)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COURSE_YEAR"=2011 AND "COURSE_TERM"=5)

Note
-----
   - automatic DOP: skipped because of IO calibrate statistics are missing


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1615  consistent gets
          0  physical reads
          0  redo size
      77978  bytes sent via SQL*Net to client
       2971  bytes received via SQL*Net from client
        239  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       3568  rows processed
 

By looking at the plan , the optimizer estimated 6764 rows , in fact ,  the query fetched only 3568 rows.       


Let us create extended statistics on these columns (  course_year and course_term ) .

SQL> Select  dbms_stats.create_extended_stats(user , 'COURSE_EXT_STATS' , '(COURSE_YEAR, COURSE_TERM)')  from dual ;

DBMS_STATS.CREATE_EXTENDED_STATS(USER,'COURSE_EXT_STATS','(COURSE_YEAR,COURSE_TERM)')
----------------------------------------------------------------------------------------------------------------------
SYS_STUNAZ4NUJOSOZ98MCGXG90RF7      

While  creating extended statistics , Oracle created another column as shown above.  The optimizer gives relatively better number.

SQL>  exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS' , method_opt=>'for all columns size 1' );
PL/SQL procedure successfully completed.

SQL>select * from course_ext_stats where course_year = 2011 and course_term = 5 ;
.....


--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  3928 | 70704 |   390   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| COURSE_EXT_STATS |  3928 | 70704 |   390   (3)| 00:00:01 |
--------------------------------------------------------------------------------------



Now , let us generate histograms up to 254 for all columns and see the output.

SQL> exec dbms_stats.delete_table_stats(user , 'COURSE_EXT_STATS');
PL/SQL procedure successfully completed.


SQL> exec dbms_stats.drop_extended_stats(user , 'COURSE_EXT_STATS' , '(COURSE_YEAR, COURSE_TERM)') ;
PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user , 'COURSE_EXT_STATS' , method_opt=>'for all columns size 254' );
PL/SQL procedure successfully completed.


SQL> select column_name , num_distinct , density , sample_size , histogram , num_buckets from user_tab_col_statistics where table_name ='COURSE_EXT_STATS';

COLUMN_NAME                    NUM_DISTINCT    DENSITY SAMPLE_SIZE HISTOGRAM       NUM_BUCKETS
------------------------------ ------------ ---------- ----------- --------------- -----------
ID                                   432879 2.3186E-06        5480 HEIGHT BALANCED         254
COURSE_YEAR                              16 1.1593E-06        5480 FREQUENCY                16
COURSE_TERM                               4 1.1593E-06        5480 FREQUENCY                 4
COURSE_NO                             82656 .000026582        5480 HEIGHT BALANCED         254



Execution Plan
----------------------------------------------------------
Plan hash value: 185418477

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                  |  3264 | 58752 |   390   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| COURSE_EXT_STATS |  3264 | 58752 |   390   (3)| 00:00:01 |
--------------------------------------------------------------------------------------
 

Now , the optimizer estimated yet another number for expected rows.
So , which one  is better ?


I guess ,  the answer is "it depends" :-) 


Also , I would need to poke around a bit.