本案例来自东区某客户,数据库版本为rac 19.7。周日业务反应有大量业务阻塞,通过一线同事发来的wait chain信息可以看到大量的row cache lock(CID=16)和library cache lock,并且final blocking event是row cache lock。final block session是not wait的。
首先row cache lock在之前的一篇row cache lock分析中已经介绍过了基本分析思路,这里就不多说明了,直接进入正题http://www.minniebaby.tech/2021/10/27/%e6%a1%88%e4%be%8b%ef%bc%9a%e4%b8%80%e6%ac%a1row-cache-lock%e5%bc%82%e5%b8%b8%e5%a4%84%e7%90%86/
ash分析:
1 2 3 4 5 6 7 8 9 |
select instance_NUMBER,program,event,sql_id,sql_exec_start,min(sample_time),max(sample_time),seq#,count(*)*10 from dba_hist_active_sess_history where event='row cache lock' and p3=5 and sample_time>trunc(sysdate-1) group by instance_NUMBER,program,event,sql_id,sql_exec_start,seq# order by sql_exec_start; INSTANCE_NUMBER PROGRAM EVENT SQL_ID SQL_EXEC_START MIN(SAMPLE_TIME) MAX(SAMPLE_TIME) SEQ# COUNT(*)*10 --------------- ------------------------------ -------------------- ------------- ------------------- ------------------------------ ------------------------------ ---------- ----------- 2 oracle@prdres2 (J003) row cache lock b6usrg82hwsa3 2023-05-07 10:07:45 07-MAY-23 10.09.03.948 AM 07-MAY-23 04.14.06.719 PM 1239 21890 2 oracle@prdres2 (J003) row cache lock b6usrg82hwsa3 2023-05-07 10:07:45 07-MAY-23 04.14.16.729 PM 07-MAY-23 06.31.25.760 PM 1241 8230 |
final block session(INSTANCE_NUMBER=2、SID=2018)执行的是自动统计信息任务。以X模式请求dc_histogram_defs的row cache lock,并且持续了非常长时间的请求。一直持续到客户重启数据库。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
*** 2023-05-07T10:55:08.609919+08:00 ((1)) (session) sid: 2018 ser: 26444 trans: 0x2b8ee7da0, creator: 0x310d9c5a8 flags: (0x8210041) USR/- flags2: (0x40009) -/-/INC flags_idl: (0x1) status: BSY/-/-/- kill: -/-/-/- DID: 0002-00B0-0003F5E70002-00B0-0003F5E6, short-term DID: txn branch: (nil) con_id/con_uid/con_name: 3/2615539819/PDBRES con_logonuid: 2615539819 con_logonid: 3 con_scuid: 1 con_scid: 1 edition#: 134 user#/name: 0/SYS oct: 170, prv: 0, sql: 0x19a891ae8, psql: 0x48a3146c0 stats: 0x1bdfa4d10, PX stats: 0x12f05d04 service name: SYS$USERS client details: O/S info: user: oracle, term: UNKNOWN, ospid: 392 machine: prdres2 program: oracle@prdres2 (J003) application name: DBMS_SCHEDULER, hash value=2478762354 action name: ORA$AT_OS_OPT_SY_5210, hash value=513032857 Current Wait Stack: 0: waiting for 'row cache lock' cache id=0x10, mode=0x0, request=0x5 pdb uid=0x9be5f86b, master hint=0x0 wait_id=1230 seq_num=1239 snap_id=1 wait times: snap=46 min 13 sec, exc=46 min 13 sec, total=46 min 13 sec wait times: max=infinite, heur=46 min 13 sec wait counts: calls=5570 os=5570 in_wait=1 iflags=0x215a2 There are 5 sessions blocked by this session. Dumping one waiter: inst: 2, sid: 3781, ser: 11134 wait event: 'library cache lock' p1: 'handle address'=0x18c056a30 p2: 'lock address'=0x17d3fdf58 p3: '100*mode+namespace'=0x1600010002 row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0 min_blocked_time: 97 secs, waiter_cache_ver: 49043 Wait State: fixed_waits=0 flags=0x22 boundary=(nil)/-1 Session Wait History: elapsed time of 0.000027 sec since current wait 0: waited for 'row cache lock' cache id=0x10, mode=0x0, request=0x5 pdb uid=0x9be5f86b, master hint=0x0 wait_id=1229 seq_num=1238 snap_id=1 wait times: snap=0.000211 sec, exc=0.000211 sec, total=0.000211 sec wait times: max=0.000000 sec wait counts: calls=2 os=2 occurred after 0.000012 sec of elapsed time 1: waited for 'KJC: Wait for msg sends to complete' msg=0x353bf7860, dest|rcvr=0x10000, mtype=0x6a wait_id=1228 seq_num=1237 snap_id=1 wait times: snap=0.000045 sec, exc=0.000045 sec, total=0.000045 sec wait times: max=5 min 1 sec wait counts: calls=2 os=2 occurred after 0.000004 sec of elapsed time 2: waited for 'DLM cross inst call completion' caller instance number=0x2, cluster incarnation number=0x7, request identifier=0x14f735f wait_id=1227 seq_num=1236 snap_id=1 wait times: snap=0.000004 sec, exc=0.000004 sec, total=0.000004 sec wait times: max=0.000000 sec wait counts: calls=0 os=0 occurred after 0.000021 sec of elapsed time 3: waited for 'row cache lock' cache id=0x10, mode=0x0, request=0x5 pdb uid=0x9be5f86b, master hint=0x0 wait_id=1226 seq_num=1235 snap_id=1 wait times: snap=0.000269 sec, exc=0.000269 sec, total=0.000269 sec wait times: max=0.000000 sec wait counts: calls=2 os=2 occurred after 0.000033 sec of elapsed time 4: waited for 'KJC: Wait for msg sends to complete' msg=0x353afe5a8, dest|rcvr=0x10000, mtype=0x6a wait_id=1225 seq_num=1234 snap_id=1 wait times: snap=0.000042 sec, exc=0.000042 sec, total=0.000042 sec wait times: max=5 min 1 sec wait counts: calls=2 os=2 occurred after 0.000013 sec of elapsed time 5: waited for 'DLM cross inst call completion' caller instance number=0x2, cluster incarnation number=0x7, request identifier=0x14f735e wait_id=1224 seq_num=1233 snap_id=1 wait times: snap=0.000011 sec, exc=0.000011 sec, total=0.000011 sec wait times: max=0.000000 sec wait counts: calls=0 os=0 occurred after 0.000064 sec of elapsed time 6: waited for 'row cache lock' cache id=0x10, mode=0x0, request=0x5 pdb uid=0x9be5f86b, master hint=0x0 wait_id=1223 seq_num=1232 snap_id=1 wait times: snap=0.000219 sec, exc=0.000219 sec, total=0.000219 sec wait times: max=0.000000 sec wait counts: calls=2 os=2 occurred after 0.000016 sec of elapsed time 7: waited for 'KJC: Wait for msg sends to complete' msg=0x353f0abe0, dest|rcvr=0x10000, mtype=0x6a wait_id=1222 seq_num=1231 snap_id=1 wait times: snap=0.000057 sec, exc=0.000057 sec, total=0.000057 sec wait times: max=5 min 1 sec wait counts: calls=2 os=2 occurred after 0.000007 sec of elapsed time 8: waited for 'DLM cross inst call completion' caller instance number=0x2, cluster incarnation number=0x7, request identifier=0x14f735d wait_id=1221 seq_num=1230 snap_id=1 wait times: snap=0.000004 sec, exc=0.000004 sec, total=0.000004 sec wait times: max=0.000000 sec wait counts: calls=0 os=0 occurred after 0.000035 sec of elapsed time 9: waited for 'row cache lock' cache id=0x10, mode=0x0, request=0x5 pdb uid=0x9be5f86b, master hint=0x0 wait_id=1220 seq_num=1229 snap_id=1 wait times: snap=0.000372 sec, exc=0.000372 sec, total=0.000372 sec wait times: max=0.000000 sec wait counts: calls=1 os=1 occurred after 0.000752 sec of elapsed time |
10点55分已经等待了46分钟的row cache lock
- P1=16(dc_histogram_defs)
- P3=5(X模式)
1 2 3 4 5 6 7 8 9 10 11 12 13 |
SOC: 0x8cb9e628, type: row cache enqueues (111), map: 0x17bf35d28 state: LIVE (0x99fc), flags: INIT (0x1) row cache enqueue: count=1 session=0x351263950 object=0x197150cc0, request=X flag=03 WAI/TRC/-/-/-/-/-/- savepoint=0x4076e type=MULTI-INSTANCE instance locked=F handle=0x8cb9e6f8 row cache parent object: addr=0x197150cc0 cid=16(dc_histogram_defs) conid=3 conuid=2615539819 hash=a3de4515 typ=61 transaction=(nil) flags=00000002 inc=1, pdbinc=1 version=2 mtx version=15688 own=0x197150d90[0x197150d90,0x197150d90] wat=0x197150da0[0x8cb9e6a8,0x8cb9e6a8] mode=N req=X status=VALID/-/-/-/-/-/-/-/- KGH pinned instance lock=QQ 6a7e84e3 f2908b26 set=0, complete=FALSE set=1, complete=FALSE |
hash=a3de4515这个是关键信息,用于后续分析的匹配和搜索。
那么为什么J003请求不到X模式的row cache lock呢?原因只能是当前有进程正在以S或者X模式持有了该队列。那么在ash blocking为空的情况下如何找出这个进程呢?
我的方法是使用hash=a3de4515去搜搜trace下面的所有trace看看能不能有发现,最终在diag trace中找到了持有的进程。
1 2 3 4 5 6 7 8 9 10 11 12 13 |
SOC: 0x3398f0508, type: row cache enqueues (111), map: 0x19faa3ba8 state: LIVE (0x99fc), flags: INIT (0x1) row cache enqueue: count=1 session=0x1c106b318 object=0x4d9d89fb0, mode=S flag=80 -/-/-/-/-/-/-/ILK savepoint=0x8fe64da type=MULTI-INSTANCE instance locked=T handle=0x3398f05d8 row cache parent object: addr=0x4d9d89fb0 cid=16(dc_histogram_defs) conid=3 conuid=2615539819 hash=a3de4515 typ=61 transaction=(nil) flags=00000002 inc=1, pdbinc=1 version=2 mtx version=698 own=0x4d9d8a080[0x3398f0588,0x3398f0588] wat=0x4d9d8a090[0x4d9d8a090,0x4d9d8a090] mode=S req=N status=VALID/-/-/-/-/-/-/-/- KGH pinned instance lock=QQ 6a7e84e3 f2908b26 set=0, complete=FALSE set=1, complete=FALSE |
hash和instance lock和请求的完全匹配,通过SO和OWNER,往上搜索可以定位到process为GEN1,再通过process的SO可以找到gen1的session state信息。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 |
SO: 0x1bff52c30, type: session (4), map: 0x1c106b318 state: LIVE (0x4532), flags: 0x1 owner: 0x1bfee0180, proc: 0x1bfee0180 link: 0x1bff52c50[0x1bf6fb2a8, 0x1bf6fb228] child list count: 7, link: 0x1bff52ca0[0x19faa3a48, 0x1bd6d1020] conid: 1, conuid: 1, SGA version=(1,0), pg: 0 SOC: 0x1c106b318, type: session (4), map: 0x1bff52c30 state: LIVE (0x99fc), flags: INIT (0x1) (session) sid: 631 ser: 35354 trans: (nil), creator: 0x1b0ea3d88 flags: (0x51) USR/- flags2: (0x409) -/-/INC flags_idl: (0x1) status: BSY/-/-/- kill: -/-/-/- 0001-000A-000000020001-000A-00000003 DID: , short-term DID: txn branch: (nil) con_id/con_uid/con_name: 3/2615539819/PDBRES con_logonuid: 1 con_logonid: 1 con_scuid: 1 con_scid: 1 edition#: 134 user#/name: 0/SYS oct: 3, prv: 0, sql: 0x3398059e8, psql: (nil) stats: 0x1dfe645a0, PX stats: 0x12f05d04 service name: SYS$BACKGROUND Current Wait Stack: 0: waiting for 'gc cr request' file#=0x9, block#=0x8a06, class#=0x1 wait_id=50342261 seq_num=11438 snap_id=1 wait times: snap=0.147084 sec, exc=0.147084 sec, total=0.147084 sec wait times: max=infinite, heur=0.147084 sec wait counts: calls=4 os=4 in_wait=1 iflags=0x5a0 Wait State: fixed_waits=0 flags=0x22 boundary=(nil)/-1 Session Wait History: elapsed time of 0.000099 sec since current wait 0: waited for 'gc cr block lost' =0x9, =0x8a06, =0x1 wait_id=50342260 seq_num=11437 snap_id=1 wait times: snap=0.512788 sec, exc=0.512788 sec, total=0.512788 sec wait times: max=infinite wait counts: calls=5 os=5 occurred after 0.000090 sec of elapsed time 1: waited for 'gc cr block lost' =0x9, =0x8a06, =0x1 wait_id=50342259 seq_num=11436 snap_id=1 wait times: snap=0.515190 sec, exc=0.515190 sec, total=0.515190 sec wait times: max=infinite wait counts: calls=5 os=5 occurred after 0.000062 sec of elapsed time 2: waited for 'gc cr block lost' =0x9, =0x8a06, =0x1 wait_id=50342258 seq_num=11435 snap_id=1 wait times: snap=0.518847 sec, exc=0.518847 sec, total=0.518847 sec wait times: max=infinite wait counts: calls=5 os=5 occurred after 0.000065 sec of elapsed time 3: waited for 'gc cr block lost' =0x9, =0x8a06, =0x1 wait_id=50342257 seq_num=11434 snap_id=1 wait times: snap=0.517242 sec, exc=0.517242 sec, total=0.517242 sec wait times: max=infinite wait counts: calls=5 os=5 occurred after 0.000060 sec of elapsed time 4: waited for 'gc cr block lost' =0x9, =0x8a06, =0x1 wait_id=50342256 seq_num=11433 snap_id=1 wait times: snap=0.512952 sec, exc=0.512952 sec, total=0.512952 sec wait times: max=infinite wait counts: calls=5 os=5 occurred after 0.000060 sec of elapsed time 5: waited for 'gc cr block lost' =0x9, =0x8a06, =0x1 wait_id=50342255 seq_num=11432 snap_id=1 wait times: snap=0.515456 sec, exc=0.515456 sec, total=0.515456 sec wait times: max=infinite wait counts: calls=5 os=5 occurred after 0.000074 sec of elapsed time 6: waited for 'gc cr block lost' =0x9, =0x8a06, =0x1 wait_id=50342254 seq_num=11431 snap_id=1 wait times: snap=0.515304 sec, exc=0.515304 sec, total=0.515304 sec wait times: max=infinite wait counts: calls=5 os=5 occurred after 0.000063 sec of elapsed time 7: waited for 'gc cr block lost' =0x9, =0x8a06, =0x1 wait_id=50342253 seq_num=11430 snap_id=1 wait times: snap=0.517334 sec, exc=0.517334 sec, total=0.517334 sec wait times: max=infinite wait counts: calls=5 os=5 occurred after 0.000057 sec of elapsed time 8: waited for 'gc cr block lost' =0x9, =0x8a06, =0x1 wait_id=50342252 seq_num=11429 snap_id=1 wait times: snap=0.517039 sec, exc=0.517039 sec, total=0.517039 sec wait times: max=infinite wait counts: calls=5 os=5 occurred after 0.000067 sec of elapsed time 9: waited for 'gc cr block lost' =0x9, =0x8a06, =0x1 wait_id=50342251 seq_num=11428 snap_id=1 wait times: snap=0.512986 sec, exc=0.512986 sec, total=0.512986 sec wait times: max=infinite wait counts: calls=5 os=5 occurred after 0.000066 sec of elapsed time Sampled Session History of session 631 serial 35354 --------------------------------------------------- The sampled session history is constructed by sampling the target session every 1 second. The sampling process captures at each sample if the session is in a non-idle wait, an idle wait, or not in a wait. If the session is in a non-idle wait then one interval is shown for all the samples the session was in the same non-idle wait. If the session is in an idle wait or not in a wait for consecutive samples then one interval is shown for all the consecutive samples. Though we display these consecutive samples in a single interval the session may NOT be continuously idle or not in a wait (the sampling process does not know). The history is displayed in reverse chronological order. sample interval: 1 sec, max history 120 sec --------------------------------------------------- [1 sample, 11:12:58] waited for 'gc cr request', seq_num: 11438 p1: 'file#'=0x9 p2: 'block#'=0x8a06 p3: 'class#'=0x1 [1 sample, 11:12:57] waited for 'gc cr block lost', seq_num: 11436 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:56] waited for 'gc cr block lost', seq_num: 11434 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:55] waited for 'gc cr block lost', seq_num: 11432 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:54] waited for 'gc cr block lost', seq_num: 11430 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:53] waited for 'gc cr block lost', seq_num: 11428 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:52] waited for 'gc cr block lost', seq_num: 11426 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:51] waited for 'gc cr block lost', seq_num: 11424 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:50] waited for 'gc cr block lost', seq_num: 11422 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:49] waited for 'gc cr block lost', seq_num: 11420 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:48] waited for 'gc cr block lost', seq_num: 11418 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:47] waited for 'gc cr block lost', seq_num: 11416 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:46] waited for 'gc cr block lost', seq_num: 11414 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:45] waited for 'gc cr block lost', seq_num: 11412 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:44] waited for 'gc cr block lost', seq_num: 11410 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:43] waited for 'gc cr block lost', seq_num: 11409 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:42] waited for 'gc cr block lost', seq_num: 11407 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:41] waited for 'gc cr block lost', seq_num: 11405 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:40] waited for 'gc cr block lost', seq_num: 11403 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:39] waited for 'gc cr block lost', seq_num: 11401 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:38] waited for 'gc cr block lost', seq_num: 11399 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:37] waited for 'gc cr block lost', seq_num: 11397 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:36] waited for 'gc cr block lost', seq_num: 11395 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:35] waited for 'gc cr block lost', seq_num: 11393 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:34] waited for 'gc cr block lost', seq_num: 11391 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:33] waited for 'gc cr block lost', seq_num: 11389 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:32] waited for 'gc cr block lost', seq_num: 11387 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:31] waited for 'gc cr block lost', seq_num: 11385 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:30] waited for 'gc cr block lost', seq_num: 11383 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:29] waited for 'gc cr block lost', seq_num: 11381 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:28] waited for 'gc cr block lost', seq_num: 11380 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:27] waited for 'gc cr block lost', seq_num: 11378 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:26] waited for 'gc cr block lost', seq_num: 11376 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:25] waited for 'gc cr block lost', seq_num: 11375 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:24] waited for 'gc cr block lost', seq_num: 11373 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:23] waited for 'gc cr block lost', seq_num: 11372 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:22] waited for 'gc cr block lost', seq_num: 11370 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:21] waited for 'gc cr block lost', seq_num: 11368 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:20] waited for 'gc cr block lost', seq_num: 11366 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 [1 sample, 11:12:19] waited for 'gc cr block lost', seq_num: 11364 p1: ''=0x9 p2: ''=0x8a06 p3: ''=0x1 |
大量的gc cr block lost,通常出现block lost都与网络丢包有关系。但是我反复查看了oswnetstat,并未发现异常,而且gc cr block lost只在gen1进程的等待中出现。
查看gen1的trace发现异常阶段有大量的如下输出:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
*** 2023-05-07T08:34:12.831763+08:00 (PDBRES(3)) IPCLW:[0.30861]{-}[WAIT]:PROTO: [1683419652831702]ipclw_data_chunk_process:1165 Discarding msg with seq # 3515351735, expecting 544659863 *** 2023-05-07T08:34:13.347836+08:00 (PDBRES(3)) IPCLW:[0.30862]{-}[WAIT]:PROTO: [1683419653347772]ipclw_data_chunk_process:1165 Discarding msg with seq # 3515352003, expecting 544659863 IPCLW:[0.30863]{-}[WAIT]:PROTO: [1683419653863530]ipclw_data_chunk_process:1165 Discarding msg with seq # 3515352455, expecting 544659863 *** 2023-05-07T08:34:14.379122+08:00 (PDBRES(3)) IPCLW:[0.30864]{-}[WAIT]:PROTO: [1683419654379038]ipclw_data_chunk_process:1165 Discarding msg with seq # 3515352766, expecting 544659863 IPCLW:[0.30865]{-}[WAIT]:PROTO: [1683419654896105]ipclw_data_chunk_process:1165 Discarding msg with seq # 3515353098, expecting 544659863 *** 2023-05-07T08:34:15.409055+08:00 (PDBRES(3)) IPCLW:[0.30866]{-}[WAIT]:PROTO: [1683419655408955]ipclw_data_chunk_process:1165 Discarding msg with seq # 3515353384, expecting 544659863 IPCLW:[0.30867]{-}[WAIT]:PROTO: [1683419655926455]ipclw_data_chunk_process:1165 Discarding msg with seq # 3515353701, expecting 544659863 *** 2023-05-07T08:34:16.437884+08:00 (PDBRES(3)) IPCLW:[0.30868]{-}[WAIT]:PROTO: [1683419656437736]ipclw_data_chunk_process:1165 Discarding msg with seq # 3515353931, expecting 544659863 IPCLW:[0.30869]{-}[WAIT]:PROTO: [1683419656951721]ipclw_data_chunk_process:1165 Discarding msg with seq # 3515354273, expecting 544659863 *** 2023-05-07T08:34:17.473483+08:00 (PDBRES(3)) IPCLW:[0.30870]{-}[WAIT]:PROTO: [1683419657473412]ipclw_data_chunk_process:1165 Discarding msg with seq # 3515354587, expecting 544659863 IPCLW:[0.30871]{-}[WAIT]:PROTO: [1683419657986986]ipclw_data_chunk_process:1165 Discarding msg with seq # 3515354878, expecting 544659863 |
时间与gen1开始反复等待gc cr block lost吻合。搜索mos发现与Bug 34482043 : SOME “GC CR MULTI BLOCK MIXED” WAIT OUTLIERS FOLLOWED BY “IPCLW DISCARDING MSG” CAUSE APERFORMANCE REGRESSION IN 19C非常相似。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
@ RELEASE NOTES: @ ]] CR Multi block read performance degradation. @ ]] @ . @ INTERNAL PROBLEM DESCRIPTION: @ Discarding out of order messages in IPCLW causes CR lost block. The block @ will be requested @ again only after receiving the side channel message which by default was @ sent 500 milliseconds later. @ @ . @ INTERNAL FIX DESCRIPTION: @ Deliver out of order messages if they are received within a certain window @ size. |
workaround是升级到19.18。
gc cr block lost只有gen1进程的sql出现过,sql是同一个,我忘记记录了,但是我很清楚的记得与user$.SPARE6有关系,个人猜测可能禁用掉”_disable_last_successful_login_time”特性也可以。
over!!!