微课sql优化(7)、统计信息收集(5)-关于直方图
1、直方图介绍
你可以为列统计直方图。这些直方图为列数据的分布情况提供准确的估算。当列数据倾斜时,直方图提供更为优化的选择性估算,为数据分布不均匀的情况提供最优的执行计划。
Oracle Database为提供2种类别的列统计信息直方图:
- Height-Balanced Histograms
- Frequency Histograms
数据库存储直方图信息*TAB_COL_STATISTICS视图(用户和DBA)。列值范围:HEIGHTBALANCED, FREQUENCY, or NONE。
2、Height-Balanced Histograms
在height-balanced histogram中,列值被划分为桶,使得每个桶包含大致相同数量的行。直方图显示端点在值范围内的位置。
考虑一个my_col值为1到100之间的列,以及一个10个桶的直方图。如果数据my_col均匀分布,则直方图看起来与图13-1类似,其中数字是端点值。例如,第七个桶具有值在60到70之间的行。
图13-1具有均匀分布的高度平衡直方图
每个桶中的行数为总行数的10%。在这个均匀分布的例子中,40%的行的值在60到100之间。
如果数据不均匀分布,则直方图可能如图13-2所示。在这种情况下,大多数行的列的值为5。只有10%的行的值在60到100之间。
图13-2具有非均匀分布的高度平衡直方图
您可以使用USER_TAB_HISTOGRAMS表格查看高度平衡的直方图,如示例13-1所示。
- BEGIN
- DBMS_STATS . GATHER_table_STATS (
- OWNNAME = > 'OE' ,
- TABNAME = > 'INVENTORIES' ,
- METHOD_OPT = > 'FOR COLUMNS SIZE 10 quantity_on_hand' ) ;
- END ;
- /
- SELECT COLUMN_NAME , NUM_DISTINCT , NUM_BUCKETS , HISTOGRAM
- FROM USER_TAB_COL_STATISTICS
- WHERE TABLE_NAME = 'INVENTORIES' AND COLUMN_NAME = 'QUANTITY_ON_HAND' ;
- COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
- ------------------------------ ------------ ----------- ---------------
- QUANTITY_ON_HAND 237 10 HEIGHT BALANCED
- SELECT ENDPOINT_NUMBER , ENDPOINT_VALUE
- FROM USER_TAB_HISTOGRAMS
- WHERE TABLE_NAME = 'INVENTORIES' AND COLUMN_NAME = 'QUANTITY_ON_HAND'
- ORDER BY ENDPOINT_NUMBER ;
- ENDPOINT_NUMBER ENDPOINT_VALUE
- --------------- --------------
- 0 0
- 1 27
- 2 42
- 3 57
- 4 74
- 5 98
- 6 123
- 7 149
- 8 175
- 9 202
- 10 353
在示例13-1查询输出中,一行(1-10)对应于直方图中的每个桶。Oracle数据库向该直方图添加了特殊的第0个数据桶,因为第1个数据桶(27)中的值不是quantity_on_hand列的最小值。第0个桶的最小值为0 quantity_on_hand。
3、 frequency histogra m
在
frequency histogram中,列的每个值对应于直方图的单个桶。每个桶包含此单个值的出现次数。例如,假设36行包含列的值1 warehouse_id。端点值1具有端点号36。
数据库在以下条件下自动创建频率直方图,而不是高度平衡的直方图:
- 不同值的数量小于或等于指定的直方图桶数(最多254个)。
- 每个列值重复一次。
您可以使用USER_TAB_HISTOGRAMS视图查看频率直方图,如示例13-2所示。
- BEGIN
- DBMS_STATS . GATHER_TABLE_STATS (
- OWNNAME = > 'OE' ,
- TABNAME = > 'INVENTORIES' ,
- METHOD_OPT = > 'FOR COLUMNS SIZE 20 warehouse_id' ) ;
- END ;
- /
- SELECT COLUMN_NAME , NUM_DISTINCT , NUM_BUCKETS , HISTOGRAM
- FROM USER_TAB_COL_STATISTICS
- WHERE TABLE_NAME = 'INVENTORIES' AND COLUMN_NAME = 'WAREHOUSE_ID' ;
- COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
- ------------------------------ ------------ ----------- ---------------
- WAREHOUSE_ID 9 9 FREQUENCY
- SELECT ENDPOINT_NUMBER , ENDPOINT_VALUE
- FROM USER_TAB_HISTOGRAMS
- WHERE TABLE_NAME = 'INVENTORIES' AND COLUMN_NAME = 'WAREHOUSE_ID'
- ORDER BY ENDPOINT_NUMBER ;
- ENDPOINT_NUMBER ENDPOINT_VALUE
- --------------- --------------
- 36 1
- 213 2
- 261 3
- 370 4
- 484 5
- 692 6
- 798 7
- 984 8
- 1112 9
在例13-2中,第一个桶为warehouse_id1。该值在表中显示36次,如以下查询所证实:
oe@PROD> SELECT COUNT(*) FROM inventories WHERE warehouse_id = 1;
COUNT(*)
----------
36
5、 练习4 、直方图优化练习
统计已销户用户数量,请优化以下语句
select count(1) from ht.c_cons where status='close';
- SQL > select status , count ( 1 ) from ht . c_cons group by status ;
- STATUS COUNT ( 1 )
- ------------------------------------------------------------ ----------
- close 19
- open 9519
- creating 462
- SQL > create index ht . idx_c_cons_status on ht . c_cons ( status ) ;
- SQL > col owner for a10
- col table_name for a20
- col column_name for a20
- col data_type for a30
- col histogram for a20
- select owner , table_name , column_name , data_type ,
- column_id , num_distinct , histogram , NUM_NULLS , LAST_ANALYZED from
- dba_tab_columns where table_name = 'C_CONS' and owner = 'HT'
- order by column_id ; SQL > SQL > SQL > SQL > SQL > 2 3 4
- OWNER TABLE_NAME COLUMN_NAME DATA_TYPE COLUMN_ID NUM_DISTINCT HISTOGRAM NUM_NULLS LAST_ANALYZED
- ---------- -------------------- -------------------- ------------------------------ ---------- ------------ -------------------- ---------- ------------------------------
- HT C_CONS CONS_NO NUMBER 1 10000 NONE 0 20 - AUG - 17
- HT C_CONS CONS_NAME VARCHAR2 2 5057 NONE 0 20 - AUG - 17
- HT C_CONS ORG_NAME VARCHAR2 3 12 NONE 0 20 - AUG - 17
- HT C_CONS BUILD_DATE DATE 4 10000 NONE 0 20 - AUG - 17
- HT C_CONS STATUS VARCHAR2 5 3 NONE 0 20 - AUG - 17
- SQL > exec DBMS_STATS . GATHER_TABLE_STATS ( ownname = > 'HT' , tabname = > 'C_CONS' , estimate_percent = > 30 , method_opt = > 'for columns size 50 status' , no_invalidate = > FALSE , degree = > 4 , cascade = > TRUE ) ;
- PL / SQL procedure successfully completed .
- SQL > col owner for a10
- SQL > col table_name for a20
- col column_name for a20
- col data_type for a30
- col histogram for a20
- select owner , table_name , column_name , data_type ,
- column_id , num_distinct , histogram , NUM_NULLS , LAST_ANALYZED from
- dba_tab_columns where table_name = 'C_CONS' and owner = 'HT'
- order by column_id ; SQL > SQL > SQL > SQL > 2 3 4
- OWNER TABLE_NAME COLUMN_NAME DATA_TYPE COLUMN_ID NUM_DISTINCT HISTOGRAM NUM_NULLS LAST_ANALYZED
- ---------- -------------------- -------------------- ------------------------------ ---------- ------------ -------------------- ---------- ------------------------------
- HT C_CONS CONS_NO NUMBER 1 10000 NONE 0 20 - AUG - 17
- HT C_CONS CONS_NAME VARCHAR2 2 5057 NONE 0 20 - AUG - 17
- HT C_CONS ORG_NAME VARCHAR2 3 12 NONE 0 20 - AUG - 17
- HT C_CONS BUILD_DATE DATE 4 10000 NONE 0 20 - AUG - 17
- HT C_CONS STATUS VARCHAR2 5 3 FREQUENCY 0 20 - AUG - 17
- SQL > select count ( 1 ) from ht . c_cons where status = 'open' ;
- Execution Plan
- ----------------------------------------------------------
- Plan hash value : 2016425671
- -------------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost ( % CPU ) | Time |
- -------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 6 | 8 ( 0 ) | 00 : 00 : 01 |
- | 1 | SORT AGGREGATE | | 1 | 6 | | |
- | * 2 | INDEX FAST FULL SCAN | IDX_C_CONS_STATUS | 9639 | 57834 | 8 ( 0 ) | 00 : 00 : 01 |
- -------------------------------------------------------------------------------------------
- Predicate Information ( identified by operation id ) :
- ---------------------------------------------------
- 2 - filter ( "STATUS" = 'open' )
- Statistics
- ----------------------------------------------------------
- 1 recursive calls
- 0 db block gets
- 28 consistent gets
- 0 physical reads
- 0 redo size
- 527 bytes sent via SQL * Net to client
- 523 bytes received via SQL * Net from client
- 2 SQL * Net roundtrips to / from client
- 0 sorts ( memory )
- 0 sorts ( disk )
- 1 rows processed
- SQL >
- SQL > select count ( 1 ) from ht . c_cons where status = 'close' ;
- Execution Plan
- ----------------------------------------------------------
- Plan hash value : 2292286995
- ---------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost ( % CPU ) | Time |
- ---------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | 6 | 1 ( 0 ) | 00 : 00 : 01 |
- | 1 | SORT AGGREGATE | | 1 | 6 | | |
- | * 2 | INDEX RANGE SCAN | IDX_C_CONS_STATUS | 24 | 144 | 1 ( 0 ) | 00 : 00 : 01 |
- ---------------------------------------------------------------------------------------
- Predicate Information ( identified by operation id ) :
- ---------------------------------------------------
- 2 - access ( "STATUS" = 'close' )
- Statistics
- ----------------------------------------------------------
- 1 recursive calls
- 0 db block gets
- 2 consistent gets
- 0 physical reads
- 0 redo size
- 526 bytes sent via SQL * Net to client
- 523 bytes received via SQL * Net from client
- 2 SQL * Net roundtrips to / from client
- 0 sorts ( memory )
- 0 sorts ( disk )
- 1 rows processed
当前文章:微课sql优化(7)、统计信息收集(5)-关于直方图
路径分享:http://cdiso.cn/article/ipdppo.html