本案例来自东区某客户,db版本为19.12,业务应用程序频繁报ORA-04030导致业务中断。
1 2 3 4 5 6 7 |
[TOC00000] Jump to table of contents Dump continued from file: /oracle/app/oracle/diag/rdbms/hsuatdb/hsuatdb1/trace/hsuatdb1_ora_208220.trc [TOC00001] ORA-04030: out of process memory when trying to allocate 538800 bytes (KSIPC Top Loca,ksipc pga chnk) [TOC00001-END] |
继续分析trace查看报错的heap
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
[TOC00003] ----- Beginning of Customized Incident Dump(s) ----- Dumping MGA handle on ORA-4030 failure. Alloc chunk = ksipc pga chnk, req size = 538800, min size = 538800 ---------------------------------------------------- ksm_mga_dump_stats : dumping type stats ---------------------------------------------------- Type = Default namespaces = 2 total segs = 10 total size = 2684354560 1. pagesize( 4k) = 2684354560 ---------------------------------------------------- ksm_mga_dump_stats : dumping namespace type stats ---------------------------------------------------- Dumping stats for KSIPC_MGA_NAMESPACE namespaces = 2 total segs = 10 total size = 2684354560 1. pagesize( 4k) = 2684354560 ksm_mga_dump_stats : dumped 1 namespace type stats ---------------------------------------------------- ksm_mga_dump_stats : dumping namespace stats ---------------------------------------------------- Dumping stats for KSIPC_MGA_NMSPC_1_0 namespaces = 1 total segs = 10 total size = 2684354560 1. pagesize( 4k) = 2684354560 -------------------------------------------------- Dumping name space heap : [ KSIPC_MGA_NMSPC_1_0 ] [ sctx = 0x1caff50c0 ] -------------------------------------------------- heap [0] : heapds = 0x1868e3008 : heaplt = 0x1caff5330 : ltobj = 0x1caff53d0 ****************************************************** |
报错与MGA有关,MGA是12的新特性,mos文档MGA (Managed Global Area) Reference Note (Doc ID 2638904.1)中有比较详细的描述。本文就不作过多的介绍。
About MGA (Managed Global Area):
The MGA allows a “smaller set” of processes (or even all processes) to share an address space for the duration (typically a query). The MGA is made up of namespaces which contain segments and heaps with space management either directly or through a heap manager (KGH).
Processes can attach to their namespace for the duration they require. For e.g., in a parallel query environment, PQs participating with the QC can share the namespace from QC to share the results. Once done, they will detach from the same and the namespace can be tiered down.
The MGA is allocated dynamically which gives us more flexibility to create, control and share. The sharing is dynamic in the sense that processes coordinate before sharing. So in that sense, MGA sits between the SGA and PGA. In addition, since the MGA is a shared memory area, a latch (latch: MGA) is used to control access and protect it.
Difference between MGA/PGA/SGA:
The MGA is not the SGA or PGA. The PGA is private and the SGA is completely shared by all processes and is not elastic. The MGA is elastic (processes can create and drop their MGA segments dynamically), shared between a set of processes or all processes, and counted under the PGA target/limit values (in v$PGASTAT). The SGA is typically created once while the MGA is created on demand.
There can be multiple MGAs in an instance while we have just one SGA.
MGA Clients and Use Cases:
The MGA is a general feature which has multiple consumers and the benefit is based on the consumers. MGA consumers are: IPC, MGA hash join, IMCDT and PQ in current releases. PQ is not a client by default. When a query uses features like MGA hash join or IMCDT, PQs internally attach to the namespace. (In other words, MGA hash join and IMCDT are actually executed by PQs with MGA being shared.)
从trace中可以看到总共2个namespace,总共分配了10个mga segments,对于MGA内存可以看到并没有使用大页,仍然使用的4k页面,关于MGA的segment大小与是否使用大页都有参数来控制
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SQL> @sp mga -- show parameter by sp -- show hidden parameter by sp old 3: where x.indx=y.indx and ksppinm like '_%&p%' new 3: where x.indx=y.indx and ksppinm like '_%mga%' NAME VALUE DESC ---------------------------------------- ---------- ------------------------------------------------------------------------------------------ _ksipc_mga_segment_size 268435456 KSIPC MGA Segment Size _use_large_pages_for_mga FALSE MGA largepage enabled _use_fallocate_for_mga FALSE MGA fallocate enabled _mga_large_page_path large page path |
可以看到每个mga segment大小都是一致的,都是256M,由参数_ksipc_mga_segment_size控制,参数_use_large_pages_for_mga控制MGA是否使用大页,默认是关闭的。这10个mga segment都能从trace的process map中看到,位于/dev/shm下。
1 2 3 4 5 6 7 8 9 10 |
400000000000-400010000000 rw-s 00000000 00:13 1819687176 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_0.dat 400010000000-400020000000 rw-s 00000000 00:13 1819710049 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_1.dat 400020000000-400030000000 rw-s 00000000 00:13 1819876463 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_2.dat 400030000000-400040000000 rw-s 00000000 00:13 1820690206 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_3.dat 400040000000-400050000000 rw-s 00000000 00:13 1824880267 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_4.dat 400050000000-400060000000 rw-s 00000000 00:13 1834118750 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_5.dat 400060000000-400070000000 rw-s 00000000 00:13 1939490055 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_6.dat 400070000000-400080000000 rw-s 00000000 00:13 2076262097 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_7.dat 400080000000-400090000000 rw-s 00000000 00:13 2096523313 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_8.dat 400090000000-4000a0000000 rw-s 00000000 00:13 2117674979 /dev/shm/ora_ffc2eb0_1cffa824e_1_KSIPC_MGA_NMSPC_1_0_9.dat |
从heapdump上看,基本TOP heap都是MGA。
1 2 3 4 5 6 7 8 |
Heapdump Analyzer v1.03 by Tanel Poder(https://blog.tanelpoder.com Total size #Chunks Chunk_size, _heap, Chunk_type, Alloc_reason 1267680024 1209 1048536 , KSIPC MGA NMSP, recreate, KSIPC Top Loca 1234277528 1177 1048664 KSIPC MGA NMSP, freeable, KSIPC Top Loca 262934400 488 538800 KSIPC Top Loca, freeable, ksipc pga chnk 233373448 457 510664 KSIPC Top Loca, perm, ... ... |
官方文档中介绍如果启用了MGA,那么PGA_AGGREGATE_LIMIT的大小需要考虑到processes,大概的公式为:
1 |
PGA_AGGREGATE_LIMIT = (original PGA_AGGREGATE_LIMIT value) + ((maximum number of connected processes) * 4M) |
继续分析trace,发现MGA大概分配到PGA_AGGREGATE_LIMIT的20%时,就会报出ORA-04030
1 2 3 4 5 6 |
going to retry allocation ksipc_memallocb: retry allocation on alt nmspc(dflt) 0 pref 0 KSIPC MGA segment precreate succeeded: pga_aggr_limit 16106127360 IPC limit:40% curr allocation:2684354560 new allocation 2952790016 segsize 268435456 hdl 0x7ffcbfae3bf8 MGA heap extent alloc failed : req = 8388552, min = 8388552 Actual req size = 8388552, min size = 8388552, sctx = 0x1caff50c0, segsz = 268435456 |
发现匹配BUG:32521805 – ORA-4030 IN MGA, MGA 20% OF PGA_AGGREGATE_LIMIT,从bug文档中并没有找到该bug的fixed版本。解决方法有两种都比较消极:
- 加大PGA_AGGREGATE_LIMIT
- 禁用MGA(“_ksipc_service_mask”=0)