该案例来自南区某客户的standby环境,数据库版本为11.2.0.4,凌晨实例发生了crash。
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Sun Apr 24 04:40:48 2022 Errors in file /opt/app/oracle/diag/rdbms/cqwmsprd22/cqwmsprd/trace/cqwmsprd_lgwr_16079.trc: ORA-04021: timeout occurred while waiting to lock object Sun Apr 24 04:40:48 2022 MRP0: Background Media Recovery terminated with error 449 Errors in file /opt/app/oracle/diag/rdbms/cqwmsprd22/cqwmsprd/trace/cqwmsprd_pr00_16241.trc: ORA-00449: background process 'LGWR' unexpectedly terminated with error 4021 LGWR (ospid: 16079): terminating the instance due to error 4021 Managed Standby Recovery not using Real Time Apply Sun Apr 24 04:40:48 2022 System state dump requested by (instance=1, osid=16079 (LGWR)), summary=[abnormal instance termination]. System State dumped to trace file /opt/app/oracle/diag/rdbms/cqwmsprd22/cqwmsprd/trace/cqwmsprd_diag_16063_20220424044048.trc Instance terminated by LGWR, pid = 16079 |
lgwr由于ORA-04021而终止了实例。diag进程收集了SystemState dump。
1 2 3 4 |
[oracle@rac1 ~]$ oerr ora 4021 04021, 00000, "timeout occurred while waiting to lock object %s%s%s%s%s" // *Cause: While waiting to lock a library object, a timeout occurred. // *Action: Retry the operation later. |
分析LGWR的process和session的state dump
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
PROCESS 14: LGWR ---------------------------------------- SO: 0x15fa9b778, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x15fa9b778, name=process, file=ksu.h LINE:12721, pg=0 (process) Oracle pid:14, ser:1, calls cur/top: 0x15467a718/0x15467a718 flags : (0x6) SYSTEM flags2: (0x0), flags3: (0x10) intr error: 0, call error: 0, sess error: 0, txn error 0 intr queue: empty ksudlp FALSE at location: 0 (post info) last post received: 0 0 26 last post received-location: ksa2.h LINE:285 ID:ksasnd last process to post me: 0x15aab64b8 1 0 last post sent: 0 0 259 last post sent-location: kgl.h LINE:8873 ID:kgllkdl: post after freeing latch last process posted by me: 0x15faa0b10 1 0 (latch info) wait_event=0 bits=0x0 Process Group: DEFAULT, pseudo proc: 0x158d3c5e8 O/S info: user: oracle, term: UNKNOWN, ospid: 16079 OSD pid info: Unix process pid: 16079, image: oracle@redprairie-chongqing-db03 (LGWR) SO: 0x156622740, type: 4, owner: 0x15fa9b778, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x15fa9b778, name=session, file=ksu.h LINE:12729, pg=0 (session) sid: 2647 ser: 1 trans: (nil), creator: 0x15fa9b778 flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x409) -/-/INC DID: , short-term DID: txn branch: (nil) edition#: 0 oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS ksuxds FALSE at location: 0 service name: SYS$BACKGROUND Current Wait Stack: Not in wait; last wait ended 1.320661 sec ago There are 14 sessions blocked by this session. Dumping one waiter: inst: 1, sid: 191, ser: 3 wait event: 'library cache lock' p1: 'handle address'=0x15ee73a58 p2: 'lock address'=0x15da21970 p3: '100*mode+namespace'=0x2004a0002 row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0 min_blocked_time: 896 secs, waiter_cache_ver: 59364 Wait State: fixed_waits=0 flags=0x21 boundary=(nil)/-1 Session Wait History: elapsed time of 1.320716 sec since last wait 0: waited for 'library cache lock' handle address=0x15ee73a58, lock address=0x15cf510c8, 100*mode+namespace=0x2004a0003 wait_id=9474667 seq_num=37628 snap_id=1 wait times: snap=15 min 0 sec, exc=15 min 0 sec, total=15 min 0 sec wait times: max=15 min 0 sec wait counts: calls=302 os=302 occurred after 0.000005 sec of elapsed time 1: waited for 'library cache: mutex X' idn=0xc489f911, value=0x0, where=0x40 wait_id=9474666 seq_num=37627 snap_id=1 wait times: snap=0.000119 sec, exc=0.000119 sec, total=0.000119 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 1 min 56 sec of elapsed time 2: waited for 'rdbms ipc message' timeout=0x12c, =0x0, =0x0 wait_id=9474665 seq_num=37626 snap_id=1 wait times: snap=2.158775 sec, exc=2.158775 sec, total=2.158775 sec wait times: max=3.000000 sec wait counts: calls=1 os=1 occurred after 0.000040 sec of elapsed time 3: waited for 'rdbms ipc message' timeout=0x12c, =0x0, =0x0 wait_id=9474664 seq_num=37625 snap_id=1 wait times: snap=3.000774 sec, exc=3.000774 sec, total=3.000774 sec wait times: max=3.000000 sec wait counts: calls=1 os=1 occurred after 0.000036 sec of elapsed time 4: waited for 'rdbms ipc message' timeout=0x12c, =0x0, =0x0 wait_id=9474663 seq_num=37624 snap_id=1 wait times: snap=3.005809 sec, exc=3.005809 sec, total=3.005809 sec wait times: max=3.000000 sec wait counts: calls=1 os=1 occurred after 0.000036 sec of elapsed time 5: waited for 'rdbms ipc message' timeout=0x12c, =0x0, =0x0 wait_id=9474662 seq_num=37623 snap_id=1 wait times: snap=3.000757 sec, exc=3.000757 sec, total=3.000757 sec wait times: max=3.000000 sec wait counts: calls=1 os=1 occurred after 0.000045 sec of elapsed time 6: waited for 'rdbms ipc message' timeout=0x12c, =0x0, =0x0 wait_id=9474661 seq_num=37622 snap_id=1 wait times: snap=3.001774 sec, exc=3.001774 sec, total=3.001774 sec wait times: max=3.000000 sec wait counts: calls=1 os=1 occurred after 0.000038 sec of elapsed time 7: waited for 'rdbms ipc message' timeout=0x12c, =0x0, =0x0 wait_id=9474660 seq_num=37621 snap_id=1 wait times: snap=3.000749 sec, exc=3.000749 sec, total=3.000749 sec wait times: max=3.000000 sec wait counts: calls=1 os=1 occurred after 0.000037 sec of elapsed time 8: waited for 'rdbms ipc message' timeout=0x12c, =0x0, =0x0 wait_id=9474659 seq_num=37620 snap_id=1 wait times: snap=3.000751 sec, exc=3.000751 sec, total=3.000751 sec wait times: max=3.000000 sec wait counts: calls=1 os=1 occurred after 0.000040 sec of elapsed time 9: waited for 'rdbms ipc message' timeout=0x12c, =0x0, =0x0 wait_id=9474658 seq_num=37619 snap_id=1 wait times: snap=3.001728 sec, exc=3.001728 sec, total=3.001728 sec wait times: max=3.000000 sec wait counts: calls=1 os=1 occurred after 0.000041 sec of elapsed time |
LGWR长时间等待library cache lock,从namespace的p3分析可以看到lgwr正在以X模式请求dbinstance(这也是standby的一大特色),由于长时间(长达15分钟)无法请求到所以导致了ORA-04021从而crash实例。
通常standby的lgwr会在下列情况下去请求X模式的dbinstance library cache lock
1 2 3 4 5 6 7 8 9 10 11 12 |
Typical items that require an instancelevel library cache lock in the standby before 18c and can cause these symptoms: 1.DDL on primary 2.Statistics gathering on primary 3.Grants / revokes on primary 4.Logon on primary (the resulting updates to last login time / user metadata in standby require it) 5.EditionBased Redefinition changes on primary 6.Query parsing, specifically hard or failed parses (focus of this note) 7.Advancement of query SCN 8.Apply/Recovery 9.Administration(Start/Cancellation) of Managed Recovery |
关于standby中由于LGWR长时间以X模式持有dbinstance引起故障的问题非常多,比如anbob的blog中的这篇文章https://www.anbob.com/archives/5126.html,供参考。但是本case恰好相反,是LGWR长时间请求不到dbinstance的library cache lock。
我们知道library cache lock和row cache lock都属于是一种enqueue,S会阻塞X,X也会阻塞S。所以肯定有进程长时间持有S模式的lc lock,或者阻塞队列过长。
从trace中我们可发现有session以S模式持有了handle为0x15ee73a58的lock,并且能够定位到该session的SO:0x15646da20
1 2 3 4 5 6 7 8 9 10 11 12 13 |
SO: 0x15ecb4090, type: 78, owner: 0x14af0a590, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x15aabfb30, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8751, pg=0 LibraryObjectLock: Address=0x15ecb4090 Handle=0x15ee73a58 Mode=S CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0 User=0x15646da20 Session=0x15646da20 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=2 LibraryHandle: Address=0x15ee73a58 Hash=c489f911 LockMode=S PinMode=0 LoadLockMode=0 Status=0 ObjectName: Name=SYS.cqwmsprd FullHashValue=cba69cb3d98177886e595a37c489f911 Namespace=DBINSTANCE(74) Type=CURSOR(00) Identifier=2 OwnerIdn=0 Statistics: InvalidationCount=0 ExecutionCount=0 LoadCount=0 ActiveLocks=6 TotalLockCount=537229 TotalPinCount=0 Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=0 Version=0 BucketInUse=4458 HandleInUse=4458 HandleReferenceCount=0 Concurrency: DependencyMutex=0x15ee73b08(0, 0, 0, 0) Mutex=0x15ee73b98(0, 1106247, 111, 0) |
搜索0x15646da20可以找到该session
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
SO: 0x15646da20, type: 4, owner: 0x15aabfb30, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x15aabfb30, name=session, file=ksu.h LINE:12729, pg=0 (session) sid: 2085 ser: 15009 trans: (nil), creator: 0x15aabfb30 flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x9) -/-/INC DID: , short-term DID: txn branch: (nil) edition#: 0 oct: 3, prv: 0, sql: 0x15d9e9a28, psql: (nil), user: 0/SYS ksuxds FALSE at location: 0 service name: SYS$USERS client details: O/S info: user: root, term: pts/0, ospid: 14273 machine: lwp-web-api-cfab8-5c7c6ffff5-tvvwj program: python3@lwp-web-api-cfab8-5c7c6ffff5-tvvwj (TNS application name: python3@lwp-web-api-cfab8-5c7c6ffff5-tvvwj (TNS , hash value=3770380047 Current Wait Stack: Not in wait; last wait ended 3.125650 sec ago There are 148 sessions blocked by this session. Dumping one waiter: inst: 1, sid: 1710, ser: 5839 wait event: 'library cache load lock' p1: 'object address'=0x1377e3440 p2: 'lock address'=0x137921060 p3: '100*mask+namespace'=0x450003 row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0 min_blocked_time: 896 secs, waiter_cache_ver: 59364 Wait State: fixed_waits=0 flags=0x21 boundary=(nil)/-1 Session Wait History: elapsed time of 3.125710 sec since last wait 0: waited for 'db file sequential read' file#=0x1, block#=0xaa9, blocks=0x1 wait_id=29 seq_num=32 snap_id=1 wait times: snap=0.016092 sec, exc=0.016092 sec, total=0.016092 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000073 sec of elapsed time 1: waited for 'read by other session' file#=0x1, block#=0xb02, class#=0x1 wait_id=28 seq_num=31 snap_id=1 wait times: snap=0.009356 sec, exc=0.009356 sec, total=0.009356 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.000009 sec of elapsed time 2: waited for 'latch: cache buffers chains' address=0x155f89310, number=0xb1, tries=0x0 wait_id=27 seq_num=30 snap_id=1 wait times: snap=0.000043 sec, exc=0.000043 sec, total=0.000043 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000028 sec of elapsed time 3: waited for 'read by other session' file#=0x1, block#=0xb01, class#=0x1 wait_id=26 seq_num=29 snap_id=1 wait times: snap=0.000301 sec, exc=0.000301 sec, total=0.000301 sec wait times: max=infinite wait counts: calls=1 os=1 occurred after 0.030419 sec of elapsed time 4: waited for 'latch: row cache objects' address=0x1475525f8, number=0x135, tries=0x0 wait_id=25 seq_num=28 snap_id=1 wait times: snap=0.027819 sec, exc=0.027819 sec, total=0.027819 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000029 sec of elapsed time 5: waited for 'db file sequential read' file#=0x1, block#=0x7950, blocks=0x1 wait_id=24 seq_num=27 snap_id=1 wait times: snap=0.002374 sec, exc=0.002374 sec, total=0.002374 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000081 sec of elapsed time 6: waited for 'db file sequential read' file#=0x1, block#=0x794b, blocks=0x1 wait_id=23 seq_num=26 snap_id=1 wait times: snap=0.029142 sec, exc=0.029142 sec, total=0.029142 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.089584 sec of elapsed time 7: waited for 'row cache lock' cache id=0x3, mode=0x0, request=0x3 wait_id=21 seq_num=25 snap_id=2 wait times: snap=0.000000 sec, exc=11 min 48 sec, total=11 min 48 sec wait times: max=infinite wait counts: calls=237 os=237 occurred after 0.000000 sec of elapsed time 8: waited for 'latch: row cache objects' address=0x1475525f8, number=0x135, tries=0x0 wait_id=22 seq_num=24 snap_id=1 wait times: snap=0.010451 sec, exc=0.010451 sec, total=0.010451 sec wait times: max=infinite wait counts: calls=0 os=0 occurred after 0.000000 sec of elapsed time 9: waited for 'row cache lock' cache id=0x3, mode=0x0, request=0x3 wait_id=21 seq_num=23 snap_id=1 wait times: snap=11 min 48 sec, exc=11 min 48 sec, total=11 min 48 sec wait times: max=infinite wait counts: calls=237 os=237 occurred after 0.004089 sec of elapsed time |
从wait history可以看到会话2085在持有dbinstance的library cache lock的情况下,长时间请求不到s模式的row cache lock,使得LGWR请求dbinstance的library cache lock超时。row cache lock对应的dc为dc_rollback_segs。
当时肯定有session以X模式持有row cache lock,但是由于这是wait history,所以无法从此时的SystemState dump中去寻找到哪个session以X模式去请求访问dc_rollback_segs。
由于这是一个standby,是一个只读的数据库,为何还会有S/X模式去请求dc_rollback_segs的row cache lock呢?我也没想明白,之前处理过相关问题的同事发给我这样一段描述
1 2 3 4 5 6 7 8 9 10 11 |
On a primary with high transaction-througput but low transaction-concurrency (few undo segments) we would end up cycling transction slots more often. Since we have non-logged cleanouts kicking in on the primary, this may not affect adversely queries on the primary. However, since non-logged cleanouts do not appear on the standby, we would end up having to perform more cleanouts (for consistent read) on a standby than on the primary. This in itself could lead to more undo header gets. Additionally we could end up doing a lot of transaction table rollbacks. There is no fix for this issue. Only workarounds can be provided. As total no of undo segments are not too high, we can ask them to set _rollback_segment_count=1000 |
似乎与备库没有redo也没有non-logged cleanouts有关。
关于non-logged cleanouts其实就是Fast Block Cleanout on Commit
1 2 3 4 5 6 7 8 9 10 11 12 |
Fast Block Cleanout on Commit Cleanout at commit time is attempted by function kcbnlc() when the transaction, after committing, deletes its List of Blocks SOs (ktldbl). For each block found in the transaction’s BL State Object, the Oracle server tries a «no-logging cleanout» by calling kcbnlc, passing it the relevant block context. Then: • The statistic “commit cleanouts” is incremented. • If cleanout is successful, the statistic “commit cleanouts successfully completed” is incremented. The ratio between the two statistics allows to measure the success rate of No-logging cleanout. After 3 failures to perform cleanout on a specific BL State Object (possibly because the block entry’s information is too old), the Oracle server will proceed to another state object (for the same transaction) if exists. |
此外还有一种情况就是dblink了,我们知道dblink会在本地产生事务
1 2 3 4 5 6 7 8 9 10 11 |
SQL> select 1 from dual@to_test; 1 ---------- 1 SQL> select status,xid from v$transaction where ses_addr in (select saddr from v$session where sid in (select sid from v$mystat where rownum=1)); STATUS XID ---------------- ---------------- ACTIVE 8803010002000000 |
并且阻塞LGWR的session,正好也正在持有handle为0x1377e3440,namespace为dblink的library cache lock,并且长时间的X模式持有handle 0x1377e3440的library cache load lock。
1 2 3 4 5 6 7 |
User=0x15646da20 Session=0x15646da20 ReferenceCount=0 Flags=[0000] SavepointNum=2 Flags=[0000] SavepointNum=2 LibraryHandle: Address=0x1377e3440 Hash=3b1dfeba LockMode=S PinMode=0 LoadLockMode=X Status=0 ObjectName: Name=.RECO.ORACLE.COM FullHashValue=022bfb39389939832aaa659c3b1dfeba Namespace=DBLINK(69) Type=CURSOR(00) Identifier=0 OwnerIdn=0 Statistics: InvalidationCount=0 ExecutionCount=0 LoadCount=0 ActiveLocks=134 TotalLockCount=134 TotalPinCount=0 |
会话2085对应的sql为:
1 2 3 4 5 6 7 8 9 10 |
LibraryObjectLock: Address=0x15ecb3f90 Handle=0x15d9e9a28 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0 User=0x15646da20 Session=0x155445ec0 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=626460cb LibraryHandle: Address=0x15d9e9a28 Hash=592bf3ba LockMode=N PinMode=0 LoadLockMode=0 Status=VALD ObjectName: Name=select host,userid,password,flag,authusr,authpwd, passwordx, authpwdx from link$ where owner#=:1 and name=:2 FullHashValue=45b5ea77f78a8282b1fdfd0f592bf3ba Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1496052666 OwnerIdn=0 Statistics: InvalidationCount=0 ExecutionCount=0 LoadCount=2 ActiveLocks=1 TotalLockCount=1 TotalPinCount=1 Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0 Concurrency: DependencyMutex=0x15d9e9ad8(0, 1, 0, 0) Mutex=0x15d9e9b68(0, 20, 0, 0) |
正在查询link$,并没有找到带dblink的语句,说明这是一个以standby为源端的dblink查询。并且也阻塞了很多dblink操作的session,event为library cache load lock
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
================================================================================================================================================= Process Waiter Details Wait Event Time Waited Blocker Details ------- --------------- ---------------- ----------- -------------- PROCESS 30: sid: 2648 ser: 9645 library cache load lock 15 min 1 sec inst: 1, sid: 2085, ser: 15009 PROCESS 35: sid: 570 ser: 17 library cache load lock 14 min 49 sec inst: 1, sid: 2085, ser: 15009 PROCESS 56: sid: 1519 ser: 18263 library cache load lock 15 min 2 sec inst: 1, sid: 2085, ser: 15009 PROCESS 57: sid: 1710 ser: 5839 library cache load lock 15 min 2 sec inst: 1, sid: 2085, ser: 15009 PROCESS 58: sid: 1898 ser: 3237 library cache load lock 15 min 2 sec inst: 1, sid: 2085, ser: 15009 PROCESS 64: sid: 2 ser: 15153 library cache load lock 15 min 3 sec inst: 1, sid: 2085, ser: 15009 PROCESS 66: sid: 388 ser: 14487 library cache load lock 15 min 3 sec inst: 1, sid: 2085, ser: 15009 PROCESS 81: sid: 195 ser: 15563 library cache load lock 15 min 3 sec inst: 1, sid: 2085, ser: 15009 PROCESS 82: sid: 386 ser: 33757 library cache load lock 15 min 4 sec inst: 1, sid: 2085, ser: 15009 PROCESS 83: sid: 576 ser: 14539 library cache load lock 15 min 4 sec inst: 1, sid: 2085, ser: 15009 PROCESS 84: sid: 764 ser: 8677 library cache load lock 15 min 4 sec inst: 1, sid: 2085, ser: 15009 PROCESS 85: sid: 953 ser: 10021 library cache load lock 15 min 4 sec inst: 1, sid: 2085, ser: 15009 PROCESS 86: sid: 1142 ser: 4645 library cache load lock 15 min 4 sec inst: 1, sid: 2085, ser: 15009 PROCESS 87: sid: 1332 ser: 4175 library cache load lock 15 min 4 sec inst: 1, sid: 2085, ser: 15009 PROCESS 88: sid: 1518 ser: 4243 library cache load lock 15 min 4 sec inst: 1, sid: 2085, ser: 15009 PROCESS 89: sid: 1708 ser: 8511 library cache load lock 15 min 4 sec inst: 1, sid: 2085, ser: 15009 PROCESS 90: sid: 1895 ser: 11751 library cache load lock 15 min 4 sec inst: 1, sid: 2085, ser: 15009 PROCESS 91: sid: 2087 ser: 421 library cache load lock 15 min 4 sec inst: 1, sid: 2085, ser: 15009 PROCESS 92: sid: 2274 ser: 10251 library cache load lock 15 min 4 sec inst: 1, sid: 2085, ser: 15009 PROCESS 94: sid: 2654 ser: 10465 library cache load lock 15 min 4 sec inst: 1, sid: 2085, ser: 15009 PROCESS 95: sid: 2841 ser: 2665 library cache load lock 14 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 96: sid: 7 ser: 813 library cache load lock 14 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 97: sid: 197 ser: 3 library cache load lock 14 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 98: sid: 385 ser: 7461 library cache load lock 14 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 99: sid: 571 ser: 25 library cache load lock 14 min 50 sec inst: 1, sid: 2085, ser: 15009 PROCESS 100: sid: 765 ser: 1 library cache load lock 14 min 50 sec inst: 1, sid: 2085, ser: 15009 PROCESS 101: sid: 947 ser: 10833 library cache load lock 14 min 50 sec inst: 1, sid: 2085, ser: 15009 PROCESS 102: sid: 1144 ser: 7 library cache load lock 14 min 50 sec inst: 1, sid: 2085, ser: 15009 PROCESS 103: sid: 1335 ser: 1 library cache load lock 14 min 50 sec inst: 1, sid: 2085, ser: 15009 PROCESS 104: sid: 1522 ser: 467 library cache load lock 14 min 50 sec inst: 1, sid: 2085, ser: 15009 PROCESS 105: sid: 1711 ser: 35 library cache load lock 14 min 50 sec inst: 1, sid: 2085, ser: 15009 PROCESS 106: sid: 1896 ser: 6687 library cache load lock 14 min 50 sec inst: 1, sid: 2085, ser: 15009 PROCESS 107: sid: 2088 ser: 1 library cache load lock 14 min 45 sec inst: 1, sid: 2085, ser: 15009 PROCESS 108: sid: 2278 ser: 1 library cache load lock 14 min 45 sec inst: 1, sid: 2085, ser: 15009 PROCESS 109: sid: 2464 ser: 187 library cache load lock 14 min 45 sec inst: 1, sid: 2085, ser: 15009 PROCESS 110: sid: 2653 ser: 1099 library cache load lock 14 min 43 sec inst: 1, sid: 2085, ser: 15009 PROCESS 111: sid: 2844 ser: 1 library cache load lock 13 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 112: sid: 5 ser: 1275 library cache load lock 13 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 113: sid: 198 ser: 1 library cache load lock 13 min 28 sec inst: 1, sid: 2085, ser: 15009 PROCESS 114: sid: 387 ser: 79 library cache load lock 12 min 12 sec inst: 1, sid: 2085, ser: 15009 PROCESS 115: sid: 577 ser: 849 library cache load lock 10 min 57 sec inst: 1, sid: 2085, ser: 15009 PROCESS 116: sid: 766 ser: 1 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 117: sid: 954 ser: 39 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 118: sid: 1145 ser: 1 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 119: sid: 1336 ser: 1 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 120: sid: 1521 ser: 35 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 121: sid: 1709 ser: 9721 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 122: sid: 1899 ser: 1 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 123: sid: 2089 ser: 1 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 124: sid: 2279 ser: 1 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 125: sid: 2467 ser: 1 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 126: sid: 2655 ser: 1 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 127: sid: 2845 ser: 1 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 128: sid: 10 ser: 1 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 129: sid: 199 ser: 1 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 130: sid: 389 ser: 97 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 131: sid: 578 ser: 1 library cache load lock 10 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 132: sid: 767 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 133: sid: 955 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 134: sid: 1146 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 135: sid: 1337 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 136: sid: 1523 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 137: sid: 1712 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 138: sid: 1900 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 139: sid: 2090 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 140: sid: 2280 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 141: sid: 2466 ser: 7 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 142: sid: 2656 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 143: sid: 2846 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 144: sid: 11 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 145: sid: 200 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 146: sid: 390 ser: 1 library cache load lock 9 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 147: sid: 579 ser: 1 library cache load lock 9 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 148: sid: 768 ser: 1 library cache load lock 9 min 42 sec inst: 1, sid: 2085, ser: 15009 PROCESS 149: sid: 956 ser: 1 library cache load lock 8 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 150: sid: 1147 ser: 1 library cache load lock 8 min 51 sec inst: 1, sid: 2085, ser: 15009 PROCESS 151: sid: 1338 ser: 1 library cache load lock 8 min 27 sec inst: 1, sid: 2085, ser: 15009 PROCESS 152: sid: 1524 ser: 1 library cache load lock 7 min 11 sec inst: 1, sid: 2085, ser: 15009 PROCESS 153: sid: 1713 ser: 1 library cache load lock 5 min 55 sec inst: 1, sid: 2085, ser: 15009 PROCESS 154: sid: 1901 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 155: sid: 2091 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 156: sid: 2281 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 157: sid: 2469 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 158: sid: 2657 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 159: sid: 2847 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 160: sid: 12 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 161: sid: 201 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 162: sid: 391 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 163: sid: 580 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 164: sid: 769 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 165: sid: 957 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 166: sid: 1148 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 167: sid: 1339 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 168: sid: 1525 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 169: sid: 1714 ser: 1 library cache load lock 5 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 170: sid: 1902 ser: 1 library cache load lock 4 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 171: sid: 2092 ser: 1 library cache load lock 4 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 172: sid: 2282 ser: 1 library cache load lock 4 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 173: sid: 2468 ser: 7 library cache load lock 4 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 174: sid: 2658 ser: 1 library cache load lock 4 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 175: sid: 2848 ser: 1 library cache load lock 4 min 52 sec inst: 1, sid: 2085, ser: 15009 PROCESS 176: sid: 13 ser: 1 library cache load lock 4 min 52 sec inst: 1, sid: 2085, ser: 15009 |
猜测当时也有很多的dblink查询操作。但是从trace中发现实例启动以来并没有dblink完整执行。应该都还在解析阶段。
1 2 3 4 5 6 7 |
0 bytes sent via SQL*Net to dblink 0 bytes received via SQL*Net from dblink 0 SQL*Net roundtrips to/from dblink 0 bytes via SQL*Net vector to client 664395087840 bytes via SQL*Net vector from client 0 bytes via SQL*Net vector to dblink 0 bytes via SQL*Net vector from dblink |
由于是standby并且已经重启,无法使用v$sysstat、v$active_session_history等视图来分析,也给分析带来了不少难度。
综上分析,结合mos文档LGWR (ospid: Xxx): Terminating The Instance Due To Error 4021 (Doc ID 2177473.1),我给的建议是:
1 2 |
ALTER SYSTEM set "_adg_parselock_timeout"=3000 scope=both; ALTER SYSTEM set "_rollback_segment_count"=1000 scope=both; |
关于_adg_parselock_timeout的算法,文章前推荐的anbob的那篇文章https://www.anbob.com/archives/5126.html中有详细描述。增大_adg_parselock_timeout只是增加了15分钟的timeout crash时间,如果长时间LGWR还是无法请求到X模式dbinstance的library cache lock,也可以尝试手动kill blocker。