本案例来自南区某制造业客户环境,数据库版本为rac 11.2.0.4。在2节点频繁的出现ORA-04031,具体alert如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
DDE: Problem Key 'ORA 4031' was completely flood controlled (0x6) Further messages for this problem key will be suppressed for up to 10 minutes Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=211373): ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=211374): ORA-04031: unable to allocate ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") bytes of shared memory ("","","","") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sat Mar 04 11:30:58 2023 Sweep [inc][211374]: completed Sweep [inc][211373]: completed Sat Mar 04 11:31:02 2023 Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=211375): ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=211376): ORA-04031: unable to allocate ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") bytes of shared memory ("","","","") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=211377): ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=211378): ORA-04031: unable to allocate ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") bytes of shared memory ("","","","") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sat Mar 04 11:31:12 2023 Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=211379): ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=211380): ORA-04031: unable to allocate ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") bytes of shared memory ("","","","") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=226244): ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=226245): ORA-04031: unable to allocate ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") bytes of shared memory ("","","","") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sat Mar 04 11:31:23 2023 Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=226246): ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=226247): ORA-04031: unable to allocate ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") bytes of shared memory ("","","","") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=226248): ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=226249): ORA-04031: unable to allocate ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") bytes of shared memory ("","","","") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sat Mar 04 11:31:34 2023 Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=226250): ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=226251): ORA-04031: unable to allocate ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") bytes of shared memory ("","","","") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=226252): ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/cqsmtx/CQSMT2/trace/CQSMT2_m000_319672.trc (incident=226253): ORA-04031: unable to allocate ORA-04031: unable to allocate 4160 bytes of shared memory ("shared pool","unknown object","sga heap(6,0)","modification ") bytes of shared memory ("","","","") Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. |
可以看到所有的报错都是来自subpool 6 duration 0。
通常ORA-04031的问题分析,都会去看看具体的trace文件,是否生成了heapdump,如果没有的话需要在业务停歇时段去采集一下heap dump,这里注意一下,dump heap的操作会长时间的持有shared pool latch,可能会导致数据库hang。或者可以查询X$KSMSP,但是对于大的shared pool,查询X$KSMSP也可能会导致数据库hang。
1 2 3 4 5 |
oradebug setmypid; oradebug unlimit; oradebug dump heapdump 536870914; oradebug tracefile_name; oradebug close_trace; |
Level Description:
- 1 PGA summary
- 2 SGA summary
- 4 UGA summary
- 8 Callheap (Current)
- 16 Callheap (User)
- 32 Large pool
- 64 Streams pool
- 128 Java pool
- 1025 PGA with contents
- 2050 SGA with contents
- 4100 UGA with contents
- 8200 Callheap with contents (Current)
- 16400 Callheap with contents (User)
- 32800 Large pool with contents
- 65600 Streams pool with contents
- 131200 Java pool with contents
为什么这里是level 536870914呢?因为536870914=0x20000002
- 0x2=SGA summary
- 0x20000000=所有heap的top sub heap信息(LARGEST SUB HEAPS)
拿到heapdump之后,可以使用tp大神的heapdump_analyzer脚本或者oracle的heap.awk去进行分析。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
---> HEAP DUMP heap name="sga heap(6,0)" desc=0x6008bf00 Type Count Sum Average ~~~~ ~~~~~ ~~~ ~~~~~~~ " 1 0 0.00 perm 6 81833176 13638862.67 "perm 4 0 0.00 free 5 8336 1667.20 R-stopper 6 288 48.00 R-free 3 40267144 13422381.33 BreakDown ~~~~~~~~~ Type Count Sum Average Percent ~~~~ ~~~~~ ~~~ ~~~~~~~ ~~~~~~~ Free(heap.awk) 9 40275480 4475053.33 32.98 perm 10 81833176 8183317.60 67.02 reserved stoppe 6 288 48.00 0.00 Total = 122108944 bytes 119247.02k 116.45MB Total PERMANENT memory actually used is 1033934808 Largest CONTIGUOUS free memory (NORMAL) was 217653160 (212551.91k) Largest CONTIGUOUS free memory (RESERVED) was 13422408 ( 13107.82k) |
可以看到4031报错的subpool 6 duration 0几乎被perm类型的chunk占满。free 32%的是保留池的chunk。
1 2 3 4 5 6 7 8 9 10 11 12 |
Total_size #Chunks Chunk_size, From_heap, Chunk_type, Alloc_reason ------------ ------- ------------ ----------------- ----------------- ----------------- 29618736 1 29618736 , sga heap(6,0), perm, perm 50782208 1 50782208 , sga heap(6,0), perm, perm 658976 1 658976 , sga heap(6,0), perm, perm 772280 1 772280 , sga heap(6,0), perm, perm 80 1 80 , sga heap(6,0), perm, perm 896 1 896 , sga heap(6,0), perm, perm 0 4 0 , sga heap(6,0), "perm, perm 80 1 80 , sga heap(6,1), perm, perm 80 1 80 , sga heap(6,2), perm, perm 80 1 80 , sga heap(6,3), perm, perm |
并且几乎所有的perm类型都分配到了每个subpool的duration 0当中。
当启用了AMM/ASMM时,并且开启了duration的情况下,几乎所有perm类型的chunk都会发配到duration 0当中,并且perm chunk一旦分配给了某个shared pool组件的heap则不会释放,也不能被shared pool的其他组件使用。所以就很容易发生duration 0的ORA-04031。
尤其是rac环境比较大的db cache,会分配大量的perm chunk给rac特有的shared pool组件,如gcs shadows/gcs resources/ges resources等等。通常rac环境如果db cache比较大都会是实际情况增加shared pool。
由于客户的环境shared pool分配已经是合理的大小了,所以这里我给到的建议是关闭duration。
1 |
alter system set "_enable_shared_pool_durations"=fasle scope=spfile; |