Experience: is what you get soon after you need it.

Experience: is what you get soon after you need it.

Rasul Allah (sal Allahu alaihi wa sallam) said: "Restore the trusts of those who trust you, and deal not falsely with him who deals falsely with you." [Abu Dawud, Tirmidhi]

Search This Blog

Sunday, August 16, 2015

Oracle Deferred Statistics, Dynamic Sampling & Extended Statistics:

Deferred Statistics, Dynamic Sampling & Extended Statistics:



Lets explore how the deferred statistics works and what impact does the dynamic sampling has on the optimizer when the statistics are missing.


SQL> show user
USER is "ARCHIVE"

SQL> create table stats (c1 number,c2 number,c3 number,c4 number,c5 number,
 2  c6 number,c7 number,c8 number,c9 number);

Table created.

SQL> begin
   for a in 1..5
    loop
      for b in 1..100 loop
        insert into stats values (a,a,a,a,a,b,b,b,b);
      end loop;
   end loop;
   end;
   /

PL/SQL procedure successfully completed.

SQL> select count(*) from stats;

 COUNT(*)
----------
      500

SQL> commit;

Commit complete.

SQL> @sql.sql                 
SQL> set lines 100
SQL> set linesize 100

SQL> select table_name,last_analyzed,sample_size,num_rows from user_tables where table_name='STATS';

TABLE_NAME              LAST_ANAL SAMPLE_SIZE   NUM_ROWS
------------------------------ --------- ----------- ----------
STATS

SQL> select index_name from user_indexes where table_name='STATS';

no rows selected

SQL> desc user_tab_pending_stats
Name                              Null?    Type
----------------------------------------------------- -------- ------------------------------------
TABLE_NAME                           VARCHAR2(30)
PARTITION_NAME                        VARCHAR2(30)
SUBPARTITION_NAME                       VARCHAR2(30)
NUM_ROWS                           NUMBER
BLOCKS                            NUMBER
AVG_ROW_LEN                           NUMBER
SAMPLE_SIZE                           NUMBER
LAST_ANALYZED                           DATE

SQL> select table_name,last_analyzed,sample_size,num_rows from user_tab_pending_stats;

no rows selected

Default method_opt options doesn’t create the histograms on the columns:

SQL> exec dbms_stats.gather_table_stats('ARCHIVE','STATS');

PL/SQL procedure successfully completed.

SQL> select table_name,last_analyzed,sample_size,num_rows from user_tables where table_name='STATS';

TABLE_NAME              LAST_ANAL SAMPLE_SIZE   NUM_ROWS
------------------------------     ---------        -----------         ----------
STATS                  16-AUG-15             500            500

SQL> select table_name,column_name,last_analyzed,sample_size,histogram from user_tab_col_statistics where table_name='STATS';

TABLE_NAME              COLUMN_NAME             LAST_ANAL SAMPLE_SIZE HISTOGRAM
------------------------------ ------------------------------ --------- ----------- ---------------
STATS                  C1                 16-AUG-15     500 NONE
                 C2                 16-AUG-15     500 NONE
                 C3                 16-AUG-15     500 NONE
                 C4                 16-AUG-15     500 NONE
                 C5                 16-AUG-15     500 NONE
                 C6                 16-AUG-15     500 NONE
                 C7                 16-AUG-15     500 NONE
                 C8                 16-AUG-15     500 NONE
                 C9                 16-AUG-15     500 NONE


9 rows selected.

SQL> select table_name,column_name,last_analyzed,sample_size from user_col_pending_stats where table_name='STATS';

no rows selected


Delete the stats:
SQL> exec dbms_stats.delete_table_stats('ARCHIVE','STATS');

PL/SQL procedure successfully completed.

SQL> select table_name,column_name,last_analyzed,sample_size from user_col_pending_stats where table_name='STATS';

no rows selected

SQL> select table_name,column_name,last_analyzed,sample_size,histogram from user_tab_col_statistics where table_name='STATS';

no rows selected



SQL> select dbms_stats.get_prefs('publish','archive','stats') publish from dual;

PUBLISH
--------------------------------------------------------------------------------
TRUE

SQL> exec dbms_stats.set_table_prefs('ARCHIVE','STATS','PUBLISH','FALSE');

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_prefs('publish','archive','stats') publish from dual;

PUBLISH
--------------------------------------------------------------------------------
FALSE

SQL> exec dbms_stats.gather_table_stats('ARCHIVE','stats',method_opt=>'for all columns size 1 for columns c1 size 254 for columns c2 size 254');

PL/SQL procedure successfully completed.

SQL> select table_name,column_name,last_analyzed,sample_size,histogram from user_tab_col_statistics where table_name='STATS';

no rows selected

SQL> select table_name,column_name,last_analyzed,sample_size from user_col_pending_stats where table_name='STATS';

TABLE_NAME              COLUMN_NAME             LAST_ANAL SAMPLE_SIZE
------------------------------ ------------------------------ --------- -----------
STATS                  C1                 16-AUG-15     500
STATS                  C2                 16-AUG-15     500
STATS                  C3                 16-AUG-15     500
STATS                  C4                 16-AUG-15     500
STATS                  C5                 16-AUG-15     500
STATS                  C6                 16-AUG-15     500
STATS                  C7                 16-AUG-15     500
STATS                  C8                 16-AUG-15     500
STATS                  C9                 16-AUG-15     500

9 rows selected.

SQL> show parameter pending

NAME                    TYPE     VALUE
------------------------------------ ----------- ------------------------------
optimizer_use_pending_statistics     boolean     FALSE
SQL>


SQL> show parameter dynamic

NAME                    TYPE     VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling         integer     2


SQL> alter session set optimizer_dynamic_sampling=0;

Session altered.

With dynamic sampling disabled and without any statistics on the table, optimizer makes a wrong guess on the number of rows.

Actual number of rows that satisfy the condition c1=1 & c2=1 are 100:

SQL> set autot on exp

SQL> select count(*) from stats where c1=1 and c2=1;

 COUNT(*)
----------
      100


Execution Plan
----------------------------------------------------------
Plan hash value: 2395854213

----------------------------------------------------------------------------
| Id  | Operation       | Name  | Rows  | Bytes | Cost (%CPU)| Time       |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |    26 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |       |     1 |    26 |       |       |
|*  2 |   TABLE ACCESS FULL| STATS |     1 |    26 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - filter("C1"=1 AND "C2"=1)


Turn on Dynamic sampling:

Optimizer guessed the number of rows correctly when using the dynamic sampling:


SQL> alter session set optimizer_dynamic_sampling=2;

Session altered.

SQL> select count(*) from stats where c1=1 and c2=1;

 COUNT(*)
----------
      100


Execution Plan
----------------------------------------------------------
Plan hash value: 2395854213

----------------------------------------------------------------------------
| Id  | Operation       | Name  | Rows  | Bytes | Cost (%CPU)| Time       |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |    26 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |       |     1 |    26 |       |       |
|*  2 |   TABLE ACCESS FULL| STATS |   100 |  2600 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - filter("C1"=1 AND "C2"=1)

Note
-----
  - dynamic sampling used for this statement (level=2)


Turn off dynamic sampling:
& Turn on optimizer_use_pending_statistics:
Optimizer guessed the number of rows wrong even with the pending statistics:


SQL> alter session set optimizer_dynamic_sampling=0;

Session altered.

SQL> alter session set optimizer_use_pending_statistics=true;

Session altered.




SQL> select count(*) from stats where c1=1 and c2=1;

 COUNT(*)
----------
      100


Execution Plan
----------------------------------------------------------
Plan hash value: 2395854213

----------------------------------------------------------------------------
| Id  | Operation       | Name  | Rows  | Bytes | Cost (%CPU)| Time       |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |     6 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |       |     1 |     6 |       |       |
|*  2 |   TABLE ACCESS FULL| STATS |    20 |   120 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - filter("C1"=1 AND "C2"=1)


How the dynamic sampling affects the optimizer and its options:

Level
When Dynamic Sampling will be used
Sample size (blocks)
0
Switches off dynamic sampling
N/A
1
At least one non-partitioned table in the statement has no statistics
32
2 (default)
One or more tables in the statement have no statistics
64
3
Any statement that meets level 2 criteria and any statement that has one or more expressions used in the where clause predicates e.g. Where substr(CUSTLASTNAME,1,3) or Where a + b =5
64
4
Any statement that meets level 3 criteria and any statement that has complex predicates. An OR or AND operator between multiple predicates on the same table
64
5
Any statement that meets level 4 criteria
128
6
Any statement that meets level 4 criteria
256
7
Any statement that meets level 4 criteria
512
8
Any statement that meets level 4 criteria
1024
9
Any statement that meets level 4 criteria
4086
10
All statements
All Blocks


Create extended statistics:

We know that c1 and c2 are corelated and has the same data so lets create extended statistics on these two columns so that optimizer can guess and report the num of rows efficiently.

Extended statistics  can help the Optimizer improve the accuracy of cardinality estimates for queries that as predicates with a function(e.g. UPPER(LastName)) or multiple columns from the same table used in filter predicates,

By creating extended statistics on a group of columns, the Optimizer can determine accurate cardinality estimate when the columns are used together as the predicates.You can use DBMS_STATS.CREATE_EXTENDED_STATS to define the column group you want to have statistics gathered on as a whole. Once the group has been established Oracle will automatically maintain the statistics on that column group when statistics are gathered on the table.



SQL> SELECT dbms_stats.create_extended_stats('ARCHIVE','STATS','(C1,C2)') FROM DUAL;

DBMS_STATS.CREATE_EXTENDED_STATS('ARCHIVE','STATS','(C1,C2)')
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SYS_STUF3GLKIOP5F4B0BTTCFTMX0W



SQL> select * from user_stat_extensions;

TABLE_NAME              EXTENSION_NAME             EXTENSION    CREATO DRO
------------------------------ ------------------------------ -------------------------------------------------------------------------------- ------ ---
STATS                  SYS_STUF3GLKIOP5F4B0BTTCFTMX0W ("C1","C2")       USER   YES

SQL> exec dbms_stats.gather_table_stats('ARCHIVE','STATS',method_opt=>'for all columns size 1 for columns(c1,c2) size 254');

PL/SQL procedure successfully completed.

SQL> select table_name,column_name,last_analyzed,sample_size,histogram from user_tab_col_statistics where table_name='STATS';

no rows selected

SQL> select table_name,column_name,last_analyzed,sample_size from user_col_pending_stats where table_name='STATS';

TABLE_NAME              COLUMN_NAME             LAST_ANAL SAMPLE_SIZE
------------------------------ ------------------------------ --------- -----------
STATS                  C2                 16-AUG-15     500
STATS                  C3                 16-AUG-15     500
STATS                  C4                 16-AUG-15     500
STATS                  C5                 16-AUG-15     500
STATS                  C6                 16-AUG-15     500
STATS                  C7                 16-AUG-15     500
STATS                  C8                 16-AUG-15     500
STATS                  C9                 16-AUG-15     500
STATS                  C1                 16-AUG-15     500
STATS                  SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 16-AUG-15     500

10 rows selected.


SQL> select count(*) from stats where c1=1 and c2=1;

 COUNT(*)
----------
      100


Execution Plan
----------------------------------------------------------
Plan hash value: 2395854213

----------------------------------------------------------------------------
| Id  | Operation       | Name  | Rows  | Bytes | Cost (%CPU)| Time       |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |     6 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |       |     1 |     6 |       |       |
|*  2 |   TABLE ACCESS FULL| STATS |   100 |   600 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - filter("C1"=1 AND "C2"=1)


Woow.. Optimizer picked up the correct cardinality now...

Let’s publish the stats:

SQL> select dbms_stats.get_prefs('publish','archive','stats') publish from dual;

PUBLISH
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TRUE


SQL> exec dbms_stats.set_table_prefs('ARCHIVE','STATS','PUBLISH','TRUE');

PL/SQL procedure successfully completed.


SQL> select table_name,column_name,last_analyzed,sample_size,histogram from user_tab_col_statistics where table_name='STATS';

no rows selected

SQL> exec dbms_stats.gather_table_stats('ARCHIVE','STATS');

PL/SQL procedure successfully completed.

SQL> select table_name,column_name,last_analyzed,sample_size,histogram from user_tab_col_statistics where table_name='STATS';

TABLE_NAME              COLUMN_NAME             LAST_ANAL SAMPLE_SIZE HISTOGRAM
------------------------------ ------------------------------ --------- ----------- ---------------
STATS                  C1                 16-AUG-15     500 FREQUENCY
STATS                  C2                 16-AUG-15     500 FREQUENCY
STATS                  C3                 16-AUG-15     500 NONE
STATS                  C4                 16-AUG-15     500 NONE
STATS                  C5                 16-AUG-15     500 NONE
STATS                  C6                 16-AUG-15     500 NONE
STATS                  C7                 16-AUG-15     500 NONE
STATS                  C8                 16-AUG-15     500 NONE
STATS                  C9                 16-AUG-15     500 NONE
STATS                  SYS_STUF3GLKIOP5F4B0BTTCFTMX0W 16-AUG-15     500 FREQUENCY

10 rows selected.


Now the stats are published and will be picked up by the optimizer


Let’s reset the dynamic sampling value back to the default value:

SQL> alter session set optimizer_dynamic_sampling=2;

Session altered.

SQL> set autot on exp
SQL> select count(*) from stats where c1=1 and c2=1;

 COUNT(*)
----------
      100


Execution Plan
----------------------------------------------------------
Plan hash value: 2395854213

----------------------------------------------------------------------------
| Id  | Operation       | Name  | Rows  | Bytes | Cost (%CPU)| Time       |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |     6 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |       |     1 |     6 |       |       |
|*  2 |   TABLE ACCESS FULL| STATS |   100 |   600 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - filter("C1"=1 AND "C2"=1)


Good article on method_opt:
https://blogs.oracle.com/optimizer/entry/how_does_the_method_opt


No comments: