本案例来自西区某客户,数据库版本为11.2.0.4,客户反馈应用异常缓慢,几乎处于不可用的状态。
查看历史活动会话信息发现,从2022-05-01 09:43开始,活动会话开始异常增长。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
2022-05-01 09:40 1 39 2022-05-01 09:40 2 83 2022-05-01 09:41 1 38 2022-05-01 09:41 2 75 2022-05-01 09:42 1 50 2022-05-01 09:42 2 199 2022-05-01 09:43 1 22 2022-05-01 09:43 2 1130 2022-05-01 09:44 1 20 2022-05-01 09:44 2 1766 2022-05-01 09:45 1 26 2022-05-01 09:45 2 2027 2022-05-01 09:46 1 88 2022-05-01 09:46 2 3462 2022-05-01 09:47 1 97 2022-05-01 09:47 2 9765 2022-05-01 09:48 1 103 2022-05-01 09:48 2 12402 2022-05-01 09:49 1 78 2022-05-01 09:49 2 17489 2022-05-01 09:50 1 96 2022-05-01 09:50 2 16880 2022-05-01 09:51 1 101 2022-05-01 09:51 2 19659 2022-05-01 09:52 1 126 2022-05-01 09:52 2 18511 2022-05-01 09:53 1 124 2022-05-01 09:53 2 15975 2022-05-01 09:54 1 100 2022-05-01 09:54 2 17142 2022-05-01 09:55 1 94 2022-05-01 09:55 2 20933 2022-05-01 09:56 1 115 2022-05-01 09:56 2 17159 2022-05-01 09:57 1 72 2022-05-01 09:57 2 19454 2022-05-01 09:58 1 33 2022-05-01 09:58 2 15774 2022-05-01 09:59 1 28 2022-05-01 09:59 2 20397 |
大量异常的活动会话几乎全是等待resmgr:cpu quantum
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
2022-05-01 09:38 2 214 2022-05-01 09:39 2 84 2022-05-01 09:40 2 80 2022-05-01 09:41 2 69 2022-05-01 09:42 2 192 2022-05-01 09:43 resmgr:cpu quantum 2 424 2022-05-01 09:43 2 689 2022-05-01 09:44 resmgr:cpu quantum 2 810 2022-05-01 09:44 2 922 2022-05-01 09:45 resmgr:cpu quantum 2 1114 2022-05-01 09:45 2 880 2022-05-01 09:46 resmgr:cpu quantum 2 2679 2022-05-01 09:46 2 757 2022-05-01 09:47 resmgr:cpu quantum 2 8831 2022-05-01 09:47 2 901 2022-05-01 09:48 resmgr:cpu quantum 2 11566 2022-05-01 09:48 2 802 2022-05-01 09:49 resmgr:cpu quantum 2 16565 2022-05-01 09:49 2 887 2022-05-01 09:50 resmgr:cpu quantum 2 16118 2022-05-01 09:50 2 737 2022-05-01 09:51 resmgr:cpu quantum 2 18701 2022-05-01 09:51 2 926 2022-05-01 09:52 resmgr:cpu quantum 2 17589 2022-05-01 09:52 2 895 2022-05-01 09:53 resmgr:cpu quantum 2 15183 2022-05-01 09:53 2 770 2022-05-01 09:54 resmgr:cpu quantum 2 16357 2022-05-01 09:54 2 752 2022-05-01 09:55 control file sequential read 2 11 2022-05-01 09:55 resmgr:cpu quantum 2 19959 2022-05-01 09:55 2 926 2022-05-01 09:56 resmgr:cpu quantum 2 16361 2022-05-01 09:56 2 765 2022-05-01 09:57 control file sequential read 2 12 2022-05-01 09:57 latch: cache buffers chains 2 2734 2022-05-01 09:57 resmgr:cpu quantum 2 15686 2022-05-01 09:57 2 985 2022-05-01 09:58 control file sequential read 2 11 2022-05-01 09:58 resmgr:cpu quantum 2 14989 2022-05-01 09:58 2 755 2022-05-01 09:59 control file sequential read 2 12 2022-05-01 09:59 resmgr:cpu quantum 2 19440 2022-05-01 09:59 2 910 |
resmgr:cpu quantum等待事件的含义在WAITEVENT: “resmgr:cpu quantum” Reference Note (Doc ID 2097889.1) 中有说明
1 2 |
Event 'resmgr: cpu quantum' is a standard event used by resource manager to control the allocation of CPU to processes. When a session waits for 'resmgr: cpu quantum' that session is waiting to be allocated a quantum of CPU time. This wait occurs when the resource manager is enabled and is throttling CPU consumption. To reduce the occurrence of this wait event, increase the CPU allocation for the session's current consumer group. |
简单来说就是,当resource manager启用对CPU的限制时,进程对应消费组所占用的CPU达到限额时,该进程将以等待resmgr: cpu quantum的形式进入等待,以保证该消费组的cpu消耗不超过限额。
从osw的vmstat可以看到刚刚出现大量resmgr: cpu quantum的时段cpu使用率仅仅为50%。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
zzz ***Sun May 1 09:40:59 CST 2022 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 39 0 0 315695840 72296 21418404 0 0 624 164 0 0 2 1 96 0 0 12 0 0 315666016 72304 21419448 0 0 67 14298 96858 95762 8 2 91 0 0 7 0 0 315674624 72312 21418280 0 0 83 16846 89062 98593 6 1 93 0 0 zzz ***Sun May 1 09:41:29 CST 2022 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 17 0 0 315677088 72696 21420468 0 0 624 164 0 0 2 1 96 0 0 12 0 0 315655424 72704 21421572 0 0 8436 213 78322 83804 4 1 95 0 0 32 0 0 315662432 72736 21421272 0 0 91 8558 110771 99331 17 1 82 0 0 zzz ***Sun May 1 09:41:59 CST 2022 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 23 1 0 315660384 73020 21422720 0 0 624 164 0 0 2 1 96 0 0 12 1 0 315610208 73028 21419544 0 0 76604 4974 171824 162508 9 3 87 1 0 15 2 0 315620288 73036 21423504 0 0 5859 17278 126332 131118 8 2 90 0 0 zzz ***Sun May 1 09:42:29 CST 2022 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 100 2 0 315660832 73400 21424484 0 0 624 164 0 0 2 1 96 0 0 70 2 0 315639616 73408 21425508 0 0 41699 9383 198482 143324 48 2 49 1 0 86 3 0 315636544 73424 21425092 0 0 43388 17133 193937 144246 47 2 50 1 0 |
查看参数resource_manager_plan,发现启用了DEFAULT_MAINTENANCE_PLAN
1 2 3 4 5 6 |
SQL> show parameter resource_manager_plan NAME TYPE VALUE ------------------------------------ ---------- ------------------------------ resource_manager_plan string SCHEDULER[0x32DF]:DEFAULT_MAIN TENANCE_PLAN |
看到DEFAULT_MAINTENANCE_PLAN应该非常熟悉,这就是11g自动任务维护时间窗口默认将会启用的resource mangager plan。
1 2 3 4 5 6 7 8 9 10 11 |
WINDOW_NAME REPEAT_INTERVAL DURATION ENABL ACTIV -------------------- ---------------------------------------------------------------------- -------------------- ----- ----- MONDAY_WINDOW freq=daily;byday=MON;byhour=22;byminute=0; bysecond=0 +000 04:00:00 TRUE FALSE TUESDAY_WINDOW freq=daily;byday=TUE;byhour=22;byminute=0; bysecond=0 +000 04:00:00 TRUE FALSE WEDNESDAY_WINDOW freq=daily;byday=WED;byhour=22;byminute=0; bysecond=0 +000 04:00:00 TRUE FALSE THURSDAY_WINDOW freq=daily;byday=THU;byhour=22;byminute=0; bysecond=0 +000 04:00:00 TRUE FALSE FRIDAY_WINDOW freq=daily;byday=FRI;byhour=22;byminute=0; bysecond=0 +000 04:00:00 TRUE FALSE SATURDAY_WINDOW freq=daily;byday=SAT;byhour=6;byminute=0; bysecond=0 +000 20:00:00 TRUE FALSE SUNDAY_WINDOW freq=daily;byday=SUN;byhour=6;byminute=0; bysecond=0 +000 20:00:00 TRUE TRUE WEEKNIGHT_WINDOW freq=daily;byday=MON,TUE,WED,THU,FRI;byhour=22;byminute=0; bysecond=0 +000 08:00:00 FALSE FALSE WEEKEND_WINDOW freq=daily;byday=SAT;byhour=0;byminute=0;bysecond=0 +002 00:00:00 FALSE FALSE |
今天51劳动节,正好是周日,该库时间窗口启用时间为早上6点,从alert也可以看到6点时启用了该时间窗口的resource manager plan
1 2 3 |
2022-05-01T06:00:00.111668+08:00 Setting Resource Manager plan SCHEDULER[0x4D52]:DEFAULT_MAINTENANCE_PLAN via scheduler window Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter |
DEFAULT_MAINTENANCE_PLAN对应消费组以及限制在dba_rsrc_plan_directives中查看
1 2 3 4 5 6 |
GROUP_OR_SUBPLAN TYPE MGMT_P1 MGMT_P2 MGMT_P3 COMMENTS MAX_UTILIZATION_LIMIT ------------------------------ ------------------------------ ---------- ---------- ---------- -------------------------------------------------- --------------------- ORA$AUTOTASK_SUB_PLAN PLAN 0 25 0 Directive for automated maintenance tasks 90 ORA$DIAGNOSTICS CONSUMER_GROUP 0 5 0 Directive for automated diagnostic tasks 90 SYS_GROUP CONSUMER_GROUP 75 0 0 Directive for system operations OTHER_GROUPS CONSUMER_GROUP 0 70 0 Directive for all other operations |
自动任务维护属于ORA$AUTOTASK_SUB_PLAN,SYS_GROUP和OTHER_GROUPS分别代表SYS/SYSTEM会话消费组和业务会话消费组。CPU限额按照MGMT_P*定义的优先级进行百分比分配,该限额是会动态调整的。假如此时几乎只有ORA$AUTOTASK_SUB_PLAN在运行,ORA$AUTOTASK_SUB_PLAN的限额可能会被调高,最大限制是MAX_UTILIZATION_LIMIT默认90%,而对应真正运行业务的OTHER_GROUPS就会降低限额。
在周日的早上6点开启的自动任务维护窗口,很可能几乎只有自动维护任务的相关session会被执行,所以有可能会提高ORA$AUTOTASK_SUB_PLAN,并且降低OTHER_GROUPS的限额。如果假设9点开始业务开始增加,那么就很容易导致resmgr: cpu quantum。
此外发现当时等待resmgr: cpu quantum的sql基本是同一个sql。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
2022-05-01 09:40 b81avsvs9ccpp 2 76 2022-05-01 09:41 b81avsvs9ccpp 2 66 2022-05-01 09:42 b81avsvs9ccpp 2 181 2022-05-01 09:43 3t74kg8jmw4dg 2 14 2022-05-01 09:43 b81avsvs9ccpp 2 1087 2022-05-01 09:44 3t74kg8jmw4dg 2 14 2022-05-01 09:44 b81avsvs9ccpp 2 1692 2022-05-01 09:44 fm8d2gv7gz1bq 2 12 2022-05-01 09:44 2 17 2022-05-01 09:45 3t74kg8jmw4dg 2 22 2022-05-01 09:45 b81avsvs9ccpp 2 1945 2022-05-01 09:45 fm8d2gv7gz1bq 2 12 2022-05-01 09:45 2 18 2022-05-01 09:46 3t74kg8jmw4dg 2 40 2022-05-01 09:46 b81avsvs9ccpp 2 3351 2022-05-01 09:46 2 23 2022-05-01 09:47 3t74kg8jmw4dg 2 48 2022-05-01 09:47 aunqkadqdumxj 2 16 2022-05-01 09:47 b81avsvs9ccpp 2 9598 2022-05-01 09:47 by3xu361c3mfy 2 12 2022-05-01 09:47 2 31 2022-05-01 09:48 3t74kg8jmw4dg 2 48 2022-05-01 09:48 4z322qbnykw3q 2 13 2022-05-01 09:48 b81avsvs9ccpp 2 12238 2022-05-01 09:48 by3xu361c3mfy 2 13 2022-05-01 09:48 2 33 2022-05-01 09:49 3t74kg8jmw4dg 2 77 2022-05-01 09:49 4z322qbnykw3q 2 13 2022-05-01 09:49 6zd1tmy35rf7b 2 11 2022-05-01 09:49 b81avsvs9ccpp 2 17280 2022-05-01 09:49 by3xu361c3mfy 2 23 2022-05-01 09:49 2 34 2022-05-01 09:50 3t74kg8jmw4dg 2 47 2022-05-01 09:50 4z322qbnykw3q 2 16 2022-05-01 09:50 aunqkadqdumxj 2 16 2022-05-01 09:50 b81avsvs9ccpp 2 16688 2022-05-01 09:50 by3xu361c3mfy 2 28 2022-05-01 09:50 2 16 2022-05-01 09:51 3t74kg8jmw4dg 2 59 2022-05-01 09:51 4z322qbnykw3q 2 46 2022-05-01 09:51 6zd1tmy35rf7b 2 13 2022-05-01 09:51 b81avsvs9ccpp 2 19415 2022-05-01 09:51 by3xu361c3mfy 2 28 2022-05-01 09:51 2 30 2022-05-01 09:52 3t74kg8jmw4dg 2 62 2022-05-01 09:52 4z322qbnykw3q 2 43 2022-05-01 09:52 b81avsvs9ccpp 2 18304 2022-05-01 09:52 by3xu361c3mfy 2 22 2022-05-01 09:52 fm8d2gv7gz1bq 2 12 2022-05-01 09:52 2 13 2022-05-01 09:53 3t74kg8jmw4dg 2 57 2022-05-01 09:53 4z322qbnykw3q 2 21 2022-05-01 09:53 b81avsvs9ccpp 2 15802 2022-05-01 09:53 by3xu361c3mfy 2 21 2022-05-01 09:54 3t74kg8jmw4dg 2 102 2022-05-01 09:54 4z322qbnykw3q 2 16 2022-05-01 09:54 b81avsvs9ccpp 2 16922 2022-05-01 09:54 by3xu361c3mfy 2 26 2022-05-01 09:54 fm8d2gv7gz1bq 2 14 2022-05-01 09:55 3t74kg8jmw4dg 2 106 2022-05-01 09:55 4z322qbnykw3q 2 31 2022-05-01 09:55 6zd1tmy35rf7b 2 15 2022-05-01 09:55 b81avsvs9ccpp 2 20637 2022-05-01 09:55 by3xu361c3mfy 2 38 2022-05-01 09:55 fm8d2gv7gz1bq 2 18 2022-05-01 09:55 2 31 2022-05-01 09:56 3t74kg8jmw4dg 2 92 2022-05-01 09:56 4z322qbnykw3q 2 16 2022-05-01 09:56 b81avsvs9ccpp 2 16934 2022-05-01 09:56 by3xu361c3mfy 2 32 2022-05-01 09:56 fm8d2gv7gz1bq 2 15 2022-05-01 09:56 2 20 2022-05-01 09:57 3t74kg8jmw4dg 2 111 2022-05-01 09:57 4z322qbnykw3q 2 34 2022-05-01 09:57 b81avsvs9ccpp 2 19171 2022-05-01 09:57 by3xu361c3mfy 2 30 2022-05-01 09:57 fm8d2gv7gz1bq 2 18 2022-05-01 09:57 2 34 2022-05-01 09:58 3t74kg8jmw4dg 2 104 2022-05-01 09:58 4z322qbnykw3q 2 21 2022-05-01 09:58 b81avsvs9ccpp 2 15548 |
怀疑有sql的并发的比之前的要高的因素存在
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
**************************************************************************************** PLAN STAT FROM ASH **************************************************************************************** SQL_ID b81avsvs9ccpp, child number 4 ------------------------------------- select a.offer_code,T.FIELD_VALUE FIELD_VALUE from pm_ext_cha t, pm_offer a where t.from_table_name = 'TD_B_DISCNT' and t.field_name = 'DISCNT_LABLE' and t.object_id = a.offer_id and a.offer_type = 'D' and a.OFFER_CODE =:1 order by A.EXPIRE_DATE desc Plan hash value: 14354385 --------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 1126 (100)| |latch: cache buffers chai(1)(0%) | | |CPU(68)(.01%) | | |SQL*Net message to client(1)(0%) | | |CPU(1)(0%) | | 1 | SORT ORDER BY | | 1 | 70 | 1126 (1)| 00:00:01 |CPU(10)(0%) | | 2 | NESTED LOOPS | | 1 | 70 | 1125 (1)| 00:00:01 |CPU(39)(0%) | | 3 | NESTED LOOPS | | 485 | 70 | 1125 (1)| 00:00:01 |CPU(125)(.02%) | |* 4 | MAT_VIEW ACCESS FULL | MV_PM_EXT_CHA | 485 | 22310 | 349 (1)| 00:00:01 |CPU(5816)(.72%) | | |resmgr:cpu quantum(656886)(81.55%) | | |resmgr:internal state cha(5)(0%) | | |latch: cache buffers chai(332)(.04%) | |* 5 | INDEX UNIQUE SCAN | SYS_C_SNAP$_202PK_PM_OFFER | 1 | | 1 (0)| 00:00:01 |latch: cache buffers chai(57227)(7.1%| | |wait list latch free(25)(0%) | | |CPU(79731)(9.9%) | |* 6 | MAT_VIEW ACCESS BY INDEX ROWID| MV_PM_OFFER | 1 | 24 | 2 (0)| 00:00:01 |latch: cache buffers chai(485)(.06%) | | |CPU(4787)(.59%) | --------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter(("T"."FIELD_NAME"='DISCNT_LABLE' AND "T"."FROM_TABLE_NAME"='TD_B_DISCNT')) 5 - access("T"."OBJECT_ID"="A"."OFFER_ID") 6 - filter(("A"."OFFER_CODE"=TO_NUMBER(:1) AND "A"."OFFER_TYPE"='D')) PL/SQL procedure successfully completed. +------------------------------------------------------------------------+ | information from awr sysdate-7 | +------------------------------------------------------------------------+ PLAN GET DISK WRITE ROWS ROWS USER_IO(MS) ELA(MS) CPU(MS) CLUSTER(MS) PLSQL END_TI I NAME HASH VALUE EXEC PRE EXEC PRE EXEC PER EXEC ROW_P PRE EXEC PRE FETCH PER EXEC PRE EXEC PRE EXEC PER EXEC PER EXEC ------ - --------------- ------------- ---------- ------------ -------- -------- ----- ----------- --------- ----------- -------- -------- ----------- -------- 01 00 2 UOP_PARAM 14354385 88.W 1,225 0 0 22.W 0 1 0 ######## 172,251 0 0 01 00 2 UOP_PARAM 14354385 59.W 0 0 0 20.W 0 0 0 191,574 23,961 0 0 01 01 2 UOP_PARAM 14354385 19.W 10,877 0 0 4.8W 0 0 0 8,776 8,277 0 0 01 01 2 UOP_PARAM 14354385 31.W 0 0 0 8.0W 0 0 0 9,409 8,907 0 0 01 02 2 UOP_PARAM 14354385 10.W 10,893 0 0 2.6W 0 0 0 8,394 7,901 0 0 01 02 2 UOP_PARAM 14354385 13.W 0 0 0 3.3W 0 0 0 8,466 7,968 0 0 01 03 2 UOP_PARAM 14354385 7.5W 10,899 0 0 1.9W 0 0 0 8,314 7,824 0 0 01 03 2 UOP_PARAM 14354385 6.6W 10,910 0 0 1.6W 0 0 0 8,188 7,697 0 0 01 04 2 UOP_PARAM 14354385 6.7W 10,898 0 0 2.0W 0 0 0 8,156 7,661 0 0 01 04 2 UOP_PARAM 14354385 6.4W 0 0 0 1.7W 0 0 0 8,183 7,693 0 0 01 05 2 UOP_PARAM 14354385 9.3W 10,894 0 0 2.5W 0 0 0 8,342 7,844 0 0 01 05 2 UOP_PARAM 14354385 8.3W 10,900 0 0 2.4W 0 0 0 8,271 7,778 0 0 01 06 2 UOP_PARAM 14354385 10.W 10,897 0 0 2.8W 0 0 0 10,480 9,976 0 0 01 06 2 UOP_PARAM 14354385 12.W 0 0 0 3.1W 0 0 0 8,416 7,911 0 0 01 07 2 UOP_PARAM 14354385 28.W 10,881 0 0 7.6W 0 0 0 11,449 10,956 0 0 01 07 2 UOP_PARAM 14354385 35.W 0 0 0 9.9W 0 0 0 9,500 9,010 0 0 01 08 2 UOP_PARAM 14354385 41.W 441 0 0 11.W 0 0 0 9,562 9,071 0 0 01 08 2 UOP_PARAM 14354385 54.W 0 0 0 15.W 0 0 0 10,766 10,276 0 0 01 09 2 UOP_PARAM 14354385 101W 0 0 0 27.W 0 0 0 91,885 52,828 0 0 01 09 2 UOP_PARAM 14354385 59.W 3,687 0 0 16.W 0 0 0 11,327 10,832 0 0 01 10 2 UOP_PARAM 14354385 141W 1,843 0 0 38.W 0 1 0 ######## 209,143 40 0 01 10 2 UOP_PARAM 14354385 109W 0 0 0 29.W 0 0 0 ######## 136,456 0 0 01 11 2 UOP_PARAM 14354385 72.W 0 0 0 20.W 0 0 0 278,573 113,357 86 0 01 11 2 UOP_PARAM 14354385 118W 4 0 0 34.W 0 0 0 194,557 85,951 1 0 01 12 2 UOP_PARAM 14354385 98.W 0 0 0 30.W 0 0 0 44,274 32,320 0 0 01 12 2 UOP_PARAM 14354385 111W 3,182 0 0 33.W 0 0 0 110,326 57,733 0 0 01 13 2 UOP_PARAM 14354385 101W 0 0 0 30.W 0 0 0 17,081 16,574 0 0 29 08 2 UOP_PARAM 14354385 28.W 10,850 0 0 9.5W 0 0 0 9,262 8,763 0 0 29 08 2 UOP_PARAM 14354385 40.W 208 0 0 13.W 0 0 0 9,965 9,464 0 0 29 09 2 UOP_PARAM 14354385 42.W 833 0 0 13.W 0 0 0 9,988 9,487 0 0 29 09 2 UOP_PARAM 14354385 55.W 0 0 0 17.W 0 0 0 10,919 10,412 0 0 29 10 2 UOP_PARAM 14354385 64.W 0 0 0 21.W 0 0 0 11,627 11,119 1 0 29 10 2 UOP_PARAM 14354385 58.W 3,417 0 0 18.W 0 0 0 11,142 10,639 0 0 29 11 2 UOP_PARAM 14354385 73.W 0 0 0 23.W 0 0 0 12,627 12,118 1 0 29 11 2 UOP_PARAM 14354385 70.W 0 0 0 22.W 0 0 0 12,308 11,795 0 0 29 12 2 UOP_PARAM 14354385 70.W 0 0 0 22.W 0 0 0 12,261 11,749 0 0 29 12 2 UOP_PARAM 14354385 70.W 4,754 0 0 22.W 0 0 0 12,485 11,970 0 0 29 13 2 UOP_PARAM 14354385 65.W 0 0 0 22.W 0 0 0 11,713 11,202 0 0 29 13 2 UOP_PARAM 14354385 70.W 0 0 0 22.W 0 0 0 12,102 11,591 |
通过对比可以看到故障期间的sql并发确实比之前要多1倍,并且对于执行次数如此之高的sql,sql性能并不算太好,因为对于一个非聚合的sql来说,平均返回行数非常少的情况下,消耗了几千甚至10000多的逻辑读,这是不合理的。假如该sql足够优化,可能也不会导致此次故障。
我们知道对于resource manager的cpu限制,是根据cpu_count参数来进行百分比配额的,而该库的cpu_count为136,实际的lcpu数量为160,并且该主机只有1个实例,这也是为何主机的cpu使用率并没有100%使用完,始终在85-90%之间的原因。虽然cpu_count与lcpu差距不大但是也或多或少算是此次故障的另一个因素。
最终建议:
- 假如周末白天也有业务的系统,建议调整自动维护任务时间窗口的时间,比如都放在晚上22点进行。
- 禁用自动维护窗口默认启用的resource manager plan
- 优化该执行次数如此高但并不够优化的sql
- 建议在确认主机只有一个实例存在的情况下,cpu_count与实际的lcpu保持一致。该参数参与了大量资源与后台进程个数的运算。如LMS、LMD、ADG pr进程等等。