新年来临之际,血的教训让大家都知道了新年来临之前要去检查分区表的新年分区是否创建的问题。本篇文章再介绍一个新年容易出现的性能问题。
客户的问题是发生在当1月1日,自动收集完统计信息之后,部分sql的执行计划由之前的INDEX SCAN变为了TFS,NESTLOOP变成了HASH JOIN。
问题临时使用sql profile绑定了执行计划,同时让客户收集了sqlhc以分析原因。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
SQL Text SELECT c.LOTNAME, c.CARRIERNAME, c.PROCESSOPERATION, c.TIMEKEY FROM ct_defectflagproduct c, lot L,CT_IMRVREFERENCE PP WHERE c.TIMEKEY >= to_char(sysdate - interval '2' day, 'yyyymmddhh24miss') || '000000' AND c.TIMEKEY <= to_char(sysdate, 'yyyymmddhh24miss') || '999999' AND c.CARRIERNAME = L.CARRIERNAME AND c.LOTNAME = L.LOTNAME AND PP.PREPROCESSOPERATIONNAME = c.PROCESSOPERATION AND PP.PRODUCTSPECNAME = L.PRODUCTSPECNAME AND L.PROCESSOPERATIONNAME = PP.PROCESSOPERATIONNAME GROUP BY c.LOTNAME, c.CARRIERNAME, c.PROCESSOPERATION, c.TIMEKEY Current Execution Plans (last execution) Captured while still in memory. Metrics below are for the last execution of each child cursor. If STATISTICS_LEVEL was set to ALL at the time of the hard-parse then A-Rows column is populated. Inst: 2 Child: 0 Plan hash value: 1922672948 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | 373K(100)| | | 1 | HASH GROUP BY | | 533 | 57564 | | 373K (1)| 00:00:15 | |* 2 | FILTER | | | | | | | |* 3 | HASH JOIN | | 533 | 57564 | 3344K| 373K (1)| 00:00:15 | |* 4 | HASH JOIN | | 46812 | 2788K| | 370K (1)| 00:00:15 | | 5 | TABLE ACCESS FULL| CT_IMRVREFERENCE | 2720 | 68000 | | 11 (0)| 00:00:01 | |* 6 | TABLE ACCESS FULL| LOT | 1718K| 59M| | 370K (1)| 00:00:15 | |* 7 | TABLE ACCESS FULL | CT_DEFECTFLAGPRODUCT | 472K| 21M| | 1622 (2)| 00:00:01 | ------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$1 5 - SEL$1 / PP@SEL$1 6 - SEL$1 / L@SEL$1 7 - SEL$1 / C@SEL$1 Outline Data ------------- /*+ BEGIN_OUTLINE_DATA IGNORE_OPTIM_EMBEDDED_HINTS OPTIMIZER_FEATURES_ENABLE('12.2.0.1') DB_VERSION('12.2.0.1') OPT_PARAM('_optimizer_use_feedback' 'false') ALL_ROWS OUTLINE_LEAF(@"SEL$1") FULL(@"SEL$1" "PP"@"SEL$1") FULL(@"SEL$1" "L"@"SEL$1") FULL(@"SEL$1" "C"@"SEL$1") LEADING(@"SEL$1" "PP"@"SEL$1" "L"@"SEL$1" "C"@"SEL$1") USE_HASH(@"SEL$1" "L"@"SEL$1") USE_HASH(@"SEL$1" "C"@"SEL$1") USE_HASH_AGGREGATION(@"SEL$1") END_OUTLINE_DATA */ Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(TO_CHAR(SYSDATE@!,'yyyymmddhh24miss')||'999999'>=TO_CHAR(SYSDATE@!-INTERVAL'+02 00:00:00' DAY(2) TO SECOND(0),'yyyymmddhh24miss')||'000000') 3 - access("C"."CARRIERNAME"="L"."CARRIERNAME" AND "C"."LOTNAME"="L"."LOTNAME" AND "PP"."PREPROCESSOPERATIONNAME"="C"."PROCESSOPERATION") 4 - access("PP"."PRODUCTSPECNAME"="L"."PRODUCTSPECNAME" AND "L"."PROCESSOPERATIONNAME"="PP"."PROCESSOPERATIONNAME") 6 - filter("L"."CARRIERNAME" IS NOT NULL) 7 - filter(("C"."TIMEKEY">=TO_CHAR(SYSDATE@!-INTERVAL'+02 00:00:00' DAY(2) TO SECOND(0),'yyyymmddhh24miss')||'000000' AND "C"."TIMEKEY"<=TO_CHAR(SYSDATE@!,'yyyymmddhh24miss' )||'999999')) |
稍微有一点CBO基础的人都能看出,造成这样的原因是统计信息变更导致card评估偏高了。查看LOT的谓词发现c.TIMEKEY >= to_char(sysdate – interval ‘2’ day, ‘yyyymmddhh24miss’) || ‘000000’ AND c.TIMEKEY <= to_char(sysdate, ‘yyyymmddhh24miss’) || ‘999999’,用了一个varchar2字段去存储时间字段,并且sql为通过sysdate查询2天内的数据,card评估返回了1718K rows。
cards的评估是与字段统计信息紧密相关的,查看TIMEKEY 字段统计信息无直方图。
# | Indexes | Col ID |
Column Name | Data Type |
Num Rows |
Num Nulls |
Sample Size |
Perc | Num Distinct |
Low Value | High Value | Last Analyzed | Avg Col Len |
Density | Num Buckets |
Histogram | Global Stats |
User Stats |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3 | 1 | PRODUCTNAME | VARCHAR2 | 532100 | 0 | 5371 | 1.0 | 452989 | 41353343353031543233 | 41353443543031413031 | 2025-01-01/17:17:38 | 11 | 2.207559e-06 | 1 | NONE | YES | NO |
2 | 3 | 4 | LOTNAME | VARCHAR2 | 532100 | 0 | 5371 | 1.0 | 25198 | 413533433330335A4E4836 | 41353443533034474E3139 | 2025-01-01/17:17:38 | 12 | 3.968569e-05 | 1 | NONE | YES | NO |
3 | 2 | 2 | PROCESSOPERATION | VARCHAR2 | 532100 | 0 | 5371 | 1.0 | 8 | 4131333030 | 4D33333030 | 2025-01-01/17:17:38 | 6 | 9.309253e-07 | 8 | FREQUENCY | YES | NO |
4 | 1 | 3 | MACHINENAME | VARCHAR2 | 532100 | 0 | 5371 | 1.0 | 13 | 413550494C30313030 | 413550494C31333030 | 2025-01-01/17:17:38 | 10 | 9.309253e-07 | 13 | FREQUENCY | YES | NO |
5 | 1 | 5 | CARRIERNAME | VARCHAR2 | 532100 | 0 | 5371 | 1.0 | 3200 | 41354130303031 | 41354631343230 | 2025-01-01/17:17:38 | 8 | 3.125000e-04 | 1 | NONE | YES | NO |
6 | 1 | 8 | TIMEKEY | VARCHAR2 | 532100 | 0 | 5321 | 1.0 | 532100 | 3230323430313137313131373237393035383439 | 3230323530313031313235393339373939353330 | 2025-01-01/17:17:38 | 21 | 1.879346e-06 | 1 | NONE | YES | NO |
无直方图的card公式翻了一下CBO那本经典书籍,在忽略null的情况下:
- >: Selectivity = (high_value – limit) / (high_value – low_value)
- >=:Selectivity = (high_value – limit) / (high_value – low_value) + 1/num_distinct
- <: (limit – low_value) / (high_value – low_value)
- <=:(limit – low_value) / (high_value – low_value)+ 1/num_distinct
- between and:(limit_high – limit_low) / (high_value – low_value)+2/num_distinct
而对于varchar2类型的加减法是要先转换为raw来计算的。所以当varchar2的’2024’变成’2025’时候high_value会大幅增加导致选择率大幅变高,card评估变大。
以>=为例,Selectivity = (high_value – limit) / (high_value – low_value) + 1/num_distinct=1-(limit-low_value)/(high_value – low_value)+1/num_distinct,当收集统计信息low_value不变,high_value大幅增加,Selectivity也会大幅增加,随着high_value单调递增。
从下面可以看到varchar2从‘20241231235959999999’增加到‘20250101000001000000’增量是非常大的,因为varchar2可以存任何字符。
1 2 3 4 5 6 7 8 9 10 11 |
SQL> select to_number(UTL_RAW.CAST_TO_RAW('20250101000001000000'))-to_number(UTL_RAW.CAST_TO_RAW('20241231235959999999')) from dual; TO_NUMBER(UTL_RAW.CAST_TO_RAW('20250101000001000000'))-TO_NUMBER(UTL_RAW.CAST_TO -------------------------------------------------------------------------------- 98989699979694909491909090909091 SQL> select to_number(UTL_RAW.CAST_TO_RAW('20241231235959999999'))-to_number(UTL_RAW.CAST_TO_RAW('20241230235959999999')) from dual; TO_NUMBER(UTL_RAW.CAST_TO_RAW('20241231235959999999'))-TO_NUMBER(UTL_RAW.CAST_TO -------------------------------------------------------------------------------- 1000000000000000000000000 |
可以自己做一个小案例来模拟该问题:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
CREATE TABLE test1( t_date TIMESTAMP, v_date varchar2(100) ); BEGIN FOR i IN 1..10000000 LOOP INSERT INTO test1 (t_date) VALUES ( TO_TIMESTAMP( TO_CHAR( TO_DATE('2021-01-01', 'YYYY-MM-DD') + DBMS_RANDOM.VALUE(0, (TO_DATE('2025-01-02', 'YYYY-MM-DD') - TO_DATE('2021-01-01', 'YYYY-MM-DD'))), 'YYYY-MM-DD HH24:MI:SS' ) || '.' || LPAD(ROUND(DBMS_RANDOM.VALUE(0, 1000000)), 6, '0'), 'YYYY-MM-DD HH24:MI:SS.FF6' \ ) ); END LOOP; COMMIT; END; / SQL> update test1 set v_date=to_char(t_date,'yyyymmddhh24missff6'); 10000000 rows updated. SQL> commit; Commit complete. SQL> EXEC dbms_stats.gather_table_stats(ownname => 'SYS',tabname =>'TEST1', estimate_percent => dbms_stats.auto_sample_size, CASCADE => true, method_opt => 'for all columns size 1', no_invalidate => false); PL/SQL procedure successfully completed. |
对timestamp和varchar2执行等价同样的sql,rows评估差距非常大。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
SQL> explain plan for select * from test1 where t_date>=to_date('20241231','yyyymmdd'); Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 4122059633 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 13680 | 427K| 24504 (1)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST1 | 13680 | 427K| 24504 (1)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- 1 - filter("T_DATE">=TIMESTAMP' 2024-12-31 00:00:00') SQL> explain plan for select * from test1 where v_date>='20241231'; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 4122059633 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2217K| 67M| 24511 (1)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST1 | 2217K| 67M| 24511 (1)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- 1 - filter("V_DATE">='20241231') |
从13680 变成了2217K,这就是为什么不建议varchar2存储时间类型字段的一个原因。可以称之为元旦炸弹,当跨年时收集统计信息就极有可能触发该问题。
总结一下,触发该问题的条件
- 使用了varchar2存储date/timestamp,并且查询条件为类似v_date>=to_char(sysdate-n)这样的条件.
- 当跨年的1月1日,插入了新年的数据并且收集了统计信息,使得字段统计信息的high_value升级(如从‘2024’到‘2025’),对于varchar2来说并不知道存储的是时间类型,oracle认为可能还存在’2024AAAA’这样的值,并且varchar2的加减法是需要转成raw之后计算的,所以’2025’和’2024’差距是非常大的,特别是字符长度很多的情况,如本案例中varchar2存储的是时间戳长度高达22位。
- 并且to_char(sysdate-n)这个条件还停留在前一年(如‘2024’),假如to_char(sysdate-n)也升到了2025,也不会有问题了。
解决方法,最好是从从根本上解决,用date/timestamp去存储时间类型字段,不要用varchar2。
如果没注意该现象的生产环境,也有解决方法。
- 少量的sql触发该问题,可以临时使用profile绑定。过几天其实自己就会变好因为sysdate-2很快就会跨过2024年。
- 大量sql触发该问题,绑定执行计划会很多很麻烦,可以考虑使用等高或者混合直方图,选择率评估公式会发生改变,偏差会大幅减少。
收集直方图后:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
SQL> explain plan for select * from test1 where v_date>='20241231'; Explained. SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 4122059633 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 37875 | 1183K| 24511 (1)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST1 | 37875 | 1183K| 24511 (1)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- 1 - filter("V_DATE">='20241231') |