Experience: is what you get soon after you need it.

Experience: is what you get soon after you need it.

Rasul Allah (sal Allahu alaihi wa sallam) said: "Restore the trusts of those who trust you, and deal not falsely with him who deals falsely with you." [Abu Dawud, Tirmidhi]

Search This Blog

Monday, October 5, 2015

Use multi column statistics


Using multi column statistics

MultiColumn Statistics

When multiple columns from a single table are used together in the where clause of a query (multiple single column predicates), the relationship between the columns can strongly affect the combined selectivity for the column group.

By default, Oracle creates column groups for a table, based on the workload analysis, similar to how it is done for histograms.
You can also create column groups manually by using the DBMS_STATS package. You can use this package to create a column group, get the name of a column group, or delete a column group from a table.

**** The optimizer will only use MultiColumn statistics with equality predicates *****
Creating a Column Group
Use the create_extended_statistics function to create a column group. The create_extended_statistics function returns the system-generated name of the newly created column group

Parameter
Description
owner
Schema owner. NULL indicates current schema.
tab_name
Name of the table to which the column group is being added.
extension
Columns in the column group.


Getting a Column Group
Use the show_extended_stats_name function to obtain the name of the column group for a given set of columns

Parameter
Description
owner
Schema owner. NULL indicates current schema.
tab_name
Name of the table to which the column group belongs.
extension
Name of the column group.

Dropping a Column Group
Use the drop_extended_stats function to delete a column group from a table.


Parameter
Description
owner
Schema owner. NULL indicates current schema.
tab_name
Name of the table to which the column group belongs.
extension
Name of the column group to be deleted.



Gathering Statistics on Column Groups
The METHOD_OPT argument of the DBMS_STATS package enables you to gather statistics on column groups. If you set the value of this argument to FOR ALL COLUMNS SIZE AUTO, the optimizer will gather statistics on all the existing column groups. To collect statistics on a new column group, specify the group using FOR COLUMNS. The column group will be automatically created as part of statistic gathering.


DBMS_STATS package "DBMS_STATS.SEED_COL_USAGE" is used to identify candidate columns for Extended Statistics. This procedure generates candidates for extended statistics when a representative dataload is being played in database.

A second procedure DBMS_STATS.CREATE_EXTENDED_STATS() can be used to define the relationship between the columns that have been identified.

DBMS_STATS.SEED_COL_USAGE can be executed few times in peak load. This will identify all the columns which need Extended Statistics.


DEMO:
====

SHAIKDB>create table ext_columns as select * from dba_tables;

Table created.

SHAIKDB>select count(*) from ext_columns;

 COUNT(*)
----------
     2795

SHAIKDB>select count(*),segment_created from dba_tables group by segment_created;

 COUNT(*) SEG
---------- ---
      826 NO
     1900 YES
   70 N/A



SHAIKDB>exec dbms_stats.gather_table_stats('test1','ext_columns');

PL/SQL procedure successfully completed.

SHAIKDB>exec dbms_stats.gather_index_stats('test1','ext_columns_idx');

PL/SQL procedure successfully completed.


SHAIKDB>set autot on exp
SHAIKDB>select count(*) from ext_columns where num_rows=0 and dropped='YES';

 COUNT(*)
----------
    0


Execution Plan
----------------------------------------------------------
Plan hash value: 1088373390

------------------------------------------------------------------------------------------------
| Id  | Operation            | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |              |     1 |     6 |     4     (0)| 00:00:01 |
|   1 |  SORT AGGREGATE          |              |     1 |     6 |        |           |
|*  2 |   TABLE ACCESS BY INDEX ROWID| EXT_COLUMNS     |     1 |     6 |     4     (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN         | EXT_COLUMNS_IDX |     7 |       |     1     (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - filter("DROPPED"='YES')
  3 - access("NUM_ROWS"=0)

SHAIKDB>select count(*) from ext_columns where num_rows=0 and dropped='NO';

 COUNT(*)
----------
     1704


Execution Plan
----------------------------------------------------------
Plan hash value: 1088373390

------------------------------------------------------------------------------------------------
| Id  | Operation            | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |              |     1 |     6 |     4     (0)| 00:00:01 |
|   1 |  SORT AGGREGATE          |              |     1 |     6 |        |           |
|*  2 |   TABLE ACCESS BY INDEX ROWID| EXT_COLUMNS     |     7 |    42 |     4     (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN         | EXT_COLUMNS_IDX |     7 |       |     1     (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - filter("DROPPED"='NO')
  3 - access("NUM_ROWS"=0)


In the above plan it shows the number of rows that matches the criteria is 7 for index_range_Scan and for the select statement “2”

In reality the selective count is far more greater than that.






SHAIKDB>select count(*) from ext_columns where num_rows=0 and dropped='NO';

 COUNT(*)
----------
     1704


SHAIKDB>select count(*) from ext_columns where num_rows=0 and dropped='YES';

 COUNT(*)
----------
    0

SHAIKDB>select count(*) from ext_columns where num_rows=0;

 COUNT(*)
----------
     1704


Above NUM_ROWS and DROPPED columns are related.



SHAIKDB>Select dbms_stats.create_extended_stats('TEST1','EXT_COLUMNS','(NUM_ROWS,DROPPED)') FROM DUAL;

DBMS_STATS.CREATE_EXTENDED_STATS('TEST1','EXT_COLUMNS','(NUM_ROWS,DROPPED)')
----------------------------------------------------------------------------------------------------
SYS_STUEXUYKWMBP9B#CB$NYP#P8I5



SHAIKDB>col EXTENSION_NAME for a30
SHAIKDB>col EXTENSION for a30
SHAIKDB>/

SHAIKDB>SELECT extension_name, extension FROM dba_stat_extensions WHERE table_name = 'EXT_COLUMNS';


EXTENSION_NAME              EXTENSION
------------------------------ ------------------------------
SYS_STUEXUYKWMBP9B#CB$NYP#P8I5 ("NUM_ROWS","DROPPED")


After the extension creation it shows the ROWS as 7.


SHAIKDB>set autot on exp
SHAIKDB>select count(*) from ext_columns where num_rows=0 and dropped='NO';

 COUNT(*)
----------
     1704


Execution Plan
----------------------------------------------------------
Plan hash value: 1088373390

------------------------------------------------------------------------------------------------
| Id  | Operation            | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |              |     1 |     6 |     4     (0)| 00:00:01 |
|   1 |  SORT AGGREGATE          |              |     1 |     6 |        |           |
|*  2 |   TABLE ACCESS BY INDEX ROWID| EXT_COLUMNS     |     7 |    42 |     4     (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN         | EXT_COLUMNS_IDX |     7 |       |     1     (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - filter("DROPPED"='NO')
  3 - access("NUM_ROWS"=0)


Gather the stats now.


SHAIKDB>exec dbms_stats.gather_table_stats('test1','ext_columns');

PL/SQL procedure successfully completed.

SHAIKDB>select count(*) from ext_columns where num_rows=0 and dropped='NO';

 COUNT(*)
----------
     1704


Execution Plan
----------------------------------------------------------
Plan hash value: 3344132040

----------------------------------------------------------------------------------
| Id  | Operation       | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     1 |     6 |    31   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |        |     1 |     6 |          |      |
|*  2 |   TABLE ACCESS FULL| EXT_COLUMNS |  1660 |  9960 |    31   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

  2 - filter("NUM_ROWS"=0 AND "DROPPED"='NO')


Reflects the correct number of rows.





Documentation:

Oracle® Database Performance Tuning Guide
11g Release 2 (11.2)
Part Number E16638-06
Chapter 13 Managing Optimizer Statistics
Section 13.3.1.5 Extended Statistics

Oracle® Database PL/SQL Packages and Types Reference
11g Release 2 (11.2)
Part Number E25788-04
Chater 141 DBMS_STATS
Extended Statistics

No comments: