今天某交管客户出现登录异常,应用反馈无法登录数据库。数据库版本为11.2.0.4,patch版本没注意,先暂时不关注。使用客户提供的异常用户测试登录发现确实会hang住。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production PL/SQL Release 11.2.0.4.0 - Production CORE 11.2.0.4.0 Production TNS for Linux: Version 11.2.0.4.0 - Production NLSRTL Version 11.2.0.4.0 - Production SQL*Plus: Release 11.2.0.4.0 Production on Wed Oct 27 14:48:25 2021 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SQL> conn centernew/XXXXXX 。。。hang |
当时等待事件为:
1 2 3 4 5 6 7 8 9 10 11 12 |
INST_ID EVENT COUNT(*) ---------- ---------------------------------------------------------------- ---------- 1 row cache lock 48 1 library cache lock 1 1 PX Deq: Execute Reply 1 1 PX Deq: Execution Msg 1 1 cursor: pin S wait on X 2 1 library cache load lock 5 2 row cache lock 48 2 PX Deq: Execution Msg 1 2 cursor: pin S wait on X 6 2 library cache load lock 5 |
当看到row cache lock和library cache lock一般来说需要分析是row cache的哪个组件发生争用,library cache lock的namespace等等。
进一步分析:
1 2 3 4 5 6 |
USERNAME EVENT P1 FINAL_BLOCKING_INSTANCE FINAL_BLOCKING_SESSION COUNT(*) -------------- ------------------------ ---------- ----------------------- ---------------------- ---------- row cache lock 10 19 row cache lock 10 2 2838 23 CENTERNEW row cache lock 10 29 CENTERNEW row cache lock 10 2 2838 26 |
从11.2之后如果是密码错误导致的密码延迟一般产生的是library cache lock,这里发现有很多session username是空的,这就说明是那些登录异常的session,都是等待row cache lock,通过p1可以判断是dc_users组件。最终阻塞会话为2838,但是尝试去kill 2838时发现并没有解决问题,只是换了一个最终阻塞会话。原因在于row cache lock是一个enqueue,虽然没有TX那么多mode,它只有S(共享)和X(排他),其中S与S兼容,S阻塞X,X阻塞S,X阻塞X。如果这时有一个X模式的请求进入到enqueue中并且没有释放的话,那么会阻塞后续所有S和X的请求。
由于急于恢复业务,kill掉所有local=no的session。但仍然可以通过diag生成的systatedump去分析。
从row cache enqueue state object中可以看到有dc_users的X模式的请求,:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
SO: 0x2f863e8230, type: 56, owner: 0x2fe20af948, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2fc14626f0, name=transaction, file=ktccts.h LINE:410, pg=0 (trans) flg = 0x00200001, flg2 = 0x004c0000, flg3 = 0x00000000, prx = (nil), ros = 2147483647, crtses=0x2fe20af948 flg = 0x00200001: ALC DDL flg2 = 0x004c0000: PGA NIP DDID flg3 = 0x00000000: bsn = 0x54d bndsn = 0x54d spn = 0x5a0 efd = 22 rfd = 0 DID: file:opiprs.c lineno:3136 parent xid: 0x0000.000.00000000 env [0x2f863e8648]: (scn: 0x0000.00000000 xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: 0x0000.000.00000000 st-scn: 0x0000.00000000 hi-scn: 0x0000.00000000 ma-scn: 0x0000.00000000 flg: 0x00000000) cev: (spc = 0 arsp = (nil) ubkds (ubk:tsn: 0 rdba: 0x00000000 flag:0x0 hdl:(nil) addr:(nil)) useg tsn: 0 rdba: 0x00000000 hwm uba: 0x00000000.0000.00 col uba: 0x00000000.0000.00 num bl: 0 bk list: 0x0) cr opc: 0x0 spc: 0 uba: 0x00000000.0000.00 Begin scn:0x0000.00000000 uba:0x00000000.0000.00 ts:1635318100[10/27/2021 15:01:40] Undo blks: 0 recs: 0 ccbstg: 0x00000000 (enqueue) released enqueue or enqueue in flux lock_flag: 0x0, lock: 0x2f863e82a8, res: 0x2fe25dd508 own: 0x2fe20af948, sess: 0x2fe20af948, prv: 0x2f863e82b8 xga: (nil), heap: UGA tsnl:0x2db343b400 nent:0 nxt:(nil) Trans IMU st: 0 Pool index 65535, Redo pool 0x2f863e89f0, Undo pool 0x2f863e8ad8 Redo pool range [0x7f70cc3a7b18 0x7f70cc3a7b18 0x7f70cc3aa318] Undo pool range [0x7f70cc3a5318 0x7f70cc3a5318 0x7f70cc3a7b18] chnf control flags 0x0 CHNF hwm uba uba: 0x00000000.0000.00 ---------------------------------------- SO: 0x2dcb5b3c48, type: 75, owner: 0x2f863e8230, flag: INIT/-/-/0x00 if: 0x1 c: 0x1 proc=0x2fc14626f0, name=row cache enqueues, file=kqr.h LINE:2075, pg=0 row cache enqueue: count=1 session=0x2fe20af948 object=0x2da7571770, request=X savepoint=0x5a0 row cache parent object: address=0x2da7571770 cid=10(dc_users) hash=59fb0f49 typ=21 transaction=(nil) flags=00000002 own=0x2da7571838[0x2da7571838,0x2da7571838] wat=0x2da7571848[0x2dcb5b3cf0,0x2dd6857b50] mode=S status=VALID/-/-/-/-/-/-/-/- request=X release=FALSE flags=a instance lock=QK 59fb0f49 fa408245 set=0, complete=FALSE set=1, complete=FALSE set=2, complete=FALSE data= 0000005e 45430009 5245544e 0057454e 00000000 00000000 00000000 00000000 00000000 44440010 34344634 43383746 46453045 00004530 00000000 00000000 00000000 45440016 4c554146 4f435f54 4d55534e 475f5245 50554f52 00000000 00000000 00000007 00000003 00000000 00000000 00000001 0a797801 20080e1b 00000000 00000000 00000000 00000000 00000000 00000000 3a53003e 44464633 33414534 35424531 32363839 37463936 37333243 39383533 30394141 42394442 42434436 42453135 37313637 45313330 43443830 43413641 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ffffffff 7ffffffc 00000000 00000000 59fb0f49 a7571770 0000002d a7c04590 0000002f a7c04590 0000002f 00000001 54314582 a7571770 0000002d a7c3a920 0000002f a7c3a920 0000002f 00000002 00000000 a7571770 0000002d a7571a78 0000002d a7571a78 0000002d 00000005 00000000 cac1d800 00007fc6 00000000 00000000 00000000 00000000 a7571770 0000002d a7571ab0 0000002d a7571ab0 0000002d a7571ad8 0000002d 0000000a 59fb0f49 fa408245 00000000 00000101 00000000 0000011d 00000000 bbb1ab88 0000002f c9458628 0000002f bfa14a00 0000002d b2bbbe98 0000002d 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 b2976208 0000002f 00000001 54552caa 00050308 08910321 a7571b48 0000002d a7571b48 0000002d 076effe6 00000000 cac1d820 00007fc6 b2976260 0000002f b2976260 0000002f 00000000 00000000 06fc053c 00000000 cac1d820 00007fc6 bbb1ab88 0000002f b2bbbf38 0000002d bfa14aa0 0000002d d9eec508 0000002f 00000002 00000001 a7571bb8 0000002d a7571bb8 0000002d 00000000 00000000 00000000 00000000 a7571bd8 0000002d a7571bd8 0000002d a7571be8 0000002d a7571be8 0000002d c276bf80 0000002f c276bf80 0000002f 00000000 00001282 cac1d808 00007fc6 00000000 00000000 00000000 00000000 a7571c28 0000002d a7571c28 0000002d 00000000 00000000 00000000 00000000 f5d3fb88 00007fff 10600160 08000200 03050001 00000000 06fc053c 00000000 cac1d820 00007fc6 00000000 00000000 7a82ab73 00000002 00000fb2 00000000 a7571838 0000002d |
通过0x2fe20af948去搜索session state object,可以找到对应的sql
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
SO: 0x2fe20af948, type: 4, owner: 0x2fc14626f0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2fc14626f0, name=session, file=ksu.h LINE:12729, pg=0 (session) sid: 4937 ser: 1 trans: 0x2f863e8230, creator: 0x2fc14626f0 flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x48009) -/DDLT2/INC DID: , short-term DID: txn branch: (nil) edition#: 100 oct: 17, prv: 0, sql: 0x2db37f7f88, psql: 0x2dcbb1eda8, user: 94/CENTERNEW ksuxds FALSE at location: 0 service name: qfjsdb client details: O/S info: user: WIN-LH84TQS6F99$, term: unknown, ospid: 1234 machine: WIN-LH84TQS6F99 program: JDBC Thin Client application name: JDBC Thin Client, hash value=2546894660 Current Wait Stack: 0: waiting for 'row cache lock' cache id=0xa, mode=0x0, request=0x5 wait_id=1260 seq_num=1261 snap_id=1 wait times: snap=0.000403 sec, exc=0.000403 sec, total=0.000403 sec wait times: max=3.000000 sec, heur=4 min 2 sec wait counts: calls=1 os=1 in_wait=1 iflags=0x5a2 There is at least one session blocking this session. Dumping first 3 direct blockers: inst: 2, sid: 1262, ser: 3 inst: 2, sid: 3363, ser: 5 inst: 2, sid: 3049, ser: 1 Dumping final blocker: inst: 2, sid: 1262, ser: 3 There are 1 sessions blocked by this session. |
根据0x2db37f7f88去搜索LibraryHandle找到sql文本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
LibraryHandle: Address=0x2db37f7f88 Hash=f92acd06 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD ObjectName: Name=grant execute any procedure to centernew FullHashValue=689002101ac48dadbb9777a1f92acd06 Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=4180331782 OwnerIdn=94 Statistics: InvalidationCount=2 ExecutionCount=3 LoadCount=4 ActiveLocks=1 TotalLockCount=3 TotalPinCount=1 Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=2 HandleInUse=2 HandleReferenceCount=0 Concurrency: DependencyMutex=0x2db37f8038(0, 2, 0, 0) Mutex=0x2db37f80c8(0, 52, 0, 0) Flags=RON/PIN/TIM/PN0/DBN/[10012841] WaitersLists: Lock=0x2db37f8018[0x2db37f8018,0x2db37f8018] Pin=0x2db37f7ff8[0x2db37f7ff8,0x2db37f7ff8] LoadLock=0x2db37f8070[0x2db37f8070,0x2db37f8070] Timestamp: Current=10-27-2021 14:07:01 HandleReference: Address=0x2db37f8160 Handle=(nil) Flags=[00] LibraryObject: Address=0x2db37ec100 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000] ChildTable: size='16' Child: id='0' Table=0x2db37ecfb0 Reference=0x2db37ec9f8 Handle=0x2db37fd7a0 NamespaceDump: Parent Cursor: sql_id=br5vrn7wkpm86 parent=0x2db37ec1a0 maxchild=1 plk=y ppn=n |
发现该session执行的sql是grant execute any procedure to centernew ,但是从session state object里面看到program: JDBC Thin Client,一般情况JDBC连接上来的session不会去单独执行grant操作。
搜索0x2fc14626f0可以找到process so,并且可以看到当时的short_stack
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
SO: 0x2fc14626f0, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2fc14626f0, name=process, file=ksu.h LINE:12721, pg=0 (process) Oracle pid:119, ser:1, calls cur/top: 0x2dcae69d40/0x2dcae3d400 flags : (0x0) - flags2: (0x0), flags3: (0x10) intr error: 0, call error: 0, sess error: 0, txn error 0 intr queue: empty ksudlp FALSE at location: 0 (post info) last post received: 0 0 80 last post received-location: kji.h LINE:3691 ID:kjata: wake up enqueue owner last process to post me: 0x2fc9458628 2 6 last post sent: 0 0 81 last post sent-location: kji.h LINE:3694 ID:kjatb: wake up enqueue blocker last process posted by me: 0x2fc9458628 2 6 (latch info) wait_event=0 bits=0x0 Process Group: DEFAULT, pseudo proc: 0x2fc173bcb0 O/S info: user: grid, term: UNKNOWN, ospid: 25668 OSD pid info: Unix process pid: 25668, image: oracle@qfjsdb1 Short stack dump: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+2022<-kslwaitctx()+163<-kqrigt()+1383<-kqrLockAndPinPo()+886<-kqrpre1()+982<-kqrpre()+19<-kzdugt()+255<-kzppsr()+342<-gradrv()+1419<-opiexe()+21971<-opiosq0()+3932<-opipls()+11961<-opiodr()+917<-rpidrus()+211<-skgmstack()+148<-rpiswu2()+690<-rpidrv()+1327<-psddr0()+473<-psdnal()+457<-pevm_EXIM()+305<-pfrinstr_EXIM()+53<-pfrrun_no_ |
calls cur/top: 0x2dcae69d40/0x2dcae3d400,继续搜索top 0x2dcae3d400看看是什么调用了该sql
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
SO: 0x2dcae3d400, type: 3, owner: 0x2fc14626f0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2fc14626f0, name=call, file=ksu.h LINE:12725, pg=0 (call) sess: cur 2fe20af948, rec 2fe20af948, usr 2fe20af948; flg:38 fl2:1; depth:0 svpt(xcb:(nil) sptn:0x403 uba: 0x00000000.0000.00) ksudlc FALSE at location: 0 SO: 0x2dcb5bd670, type: 79, owner: 0x2dcae3d400, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2fc14626f0, name=LIBRARY OBJECT PIN, file=kgl.h LINE:8755, pg=0 LibraryObjectPin: Address=0x2dcb5bd670 Handle=0x2dcb5e3770 Mode=S Lock=0x2dcb5b3b48 ClusterLock=0x2fcbf13520 Context=0x7f70cc168da8 User=0x2fe20af948 Session=0x2fe20af948 Count=3 Mask=0011 Flags=[08] SavepointNum=0x470 LibraryHandle: Address=0x2dcb5e3770 Hash=dd787fa5 LockMode=N PinMode=S LoadLockMode=0 Status=VALD ObjectName: Name=CENTERNEW.DBT_ALL_FN_TRC_EXETCPU_REQ FullHashValue=5638f5f26d1a229a59246106dd787fa5 Namespace=TABLE/PROCEDURE(01) Type=PROCEDURE(07) Identifier=313238 OwnerIdn=94 Statistics: InvalidationCount=0 ExecutionCount=0 LoadCount=1 ActiveLocks=1 TotalLockCount=2 TotalPinCount=2 Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=0 Version=0 BucketInUse=10 HandleInUse=10 HandleReferenceCount=0 Concurrency: DependencyMutex=0x2dcb5e3820(0, 1, 0, 0) Mutex=0x2dcb5e38b0(0, 47, 0, 0) |
发现是CENTERNEW.DBT_ALL_FN_TRC_EXETCPU_REQ存储过程调用的,由于systatedump是基于某个时刻,所以最好分析ash查询还有哪些存储过程对dc_users进行x模式请求。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
SQL> select PLSQL_ENTRY_OBJECT_ID,PLSQL_ENTRY_SUBPROGRAM_ID,SQL_ID,COUNT(*) FROM DBA_HIST_ACTIVE_SESS_HISTORY WHERE SAMPLE_TIME>=TRUNC(SYSDATE) AND EVENT='row cache lock' and p1 in (10) and p3=5 group by PLSQL_ENTRY_OBJECT_ID,SQL_ID,PLSQL_ENTRY_SUBPROGRAM_ID; PLSQL_ENTRY_OBJECT_ID PLSQL_ENTRY_SUBPROGRAM_ID SQL_ID COUNT(*) --------------------- ------------------------- ------------- ---------- 6y6skvd1h9rn9 84 313229 1 2kbmwvdzf4vx8 527 313229 1 br5vrn7wkpm86 444 313230 1 2kbmwvdzf4vx8 1865 313230 1 br5vrn7wkpm86 616 313231 1 2kbmwvdzf4vx8 444 313231 1 br5vrn7wkpm86 748 313235 1 2kbmwvdzf4vx8 129 313235 1 br5vrn7wkpm86 601 313238 1 2kbmwvdzf4vx8 1609 313238 1 br5vrn7wkpm86 302 SQL> select owner,object_name,object_type from dba_objects where object_id in (313229,313230,313231,313235,313238) OWNER OBJECT_NAME OBJECT_TYP ------------------------------ ---------------------------------------- ---------- CENTERNEW DBT_ALL_FN_TRC_EXETCPU_REQ PROCEDURE CENTERNEW DBT_ALL_FN_RM_LHBU_REQ PROCEDURE CENTERNEW DBT_ALL_FN_HIGHWAYEXITLIST PROCEDURE CENTERNEW DBT_ALL_FN_HIGHWAYENTRYLIST PROCEDURE CENTERNEW DBT_ALL_FN_ETC_TOLL_BASEINFO PROCEDURE SQL> select sql_text from dba_hist_sqltext where sql_id in ('2kbmwvdzf4vx8','br5vrn7wkpm86'); SQL_TEXT -------------------------------------------------------------------------------- grant execute any pr grant create any tab |
至此此案例分析完毕,从ash可以看到有5个存储过程中包含了grant user的语句,并且反复执行了多次导致了此次故障。