本案例来自某省电信一套11.2.0.4的rac,应用的存储过程调用一直在报ORA-00600: internal error code, arguments: [4097],对于经常搞恢复的人来说,这个错误非常熟悉,都不用分析直接重建undo即可,但是作为一个专业的troubleshooter,还是多少分析一下来龙去脉吧。。。
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Mon Nov 22 19:56:16 2021 Errors in file /u01/oracle/diag/rdbms/gzdb/gzdb2/trace/gzdb2_ora_4166.trc (incident=356497): ORA-00600: internal error code, arguments: [ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], [] ], [], [], [], [], [], [], [], [], [], [], [] ORA-02063: preceding line from GZINTF Incident details in: /u01/oracle/diag/rdbms/gzdb/gzdb2/incident/incdir_356497/gzdb2_ora_4166_i356497.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Mon Nov 22 19:56:17 2021 Dumping diagnostic data in directory=[cdmp_20211122195617], requested by (instance=2, osid=4166), summary=[incident=356497]. Mon Nov 22 19:56:18 2021 Sweep [inc][356497]: completed Sweep [inc2][356497]: completed |
查看trace发现报错的sql是一个dblink插入远程库的sql
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
*** 2021-11-22 19:56:16.621 dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0) ----- Current SQL Statement for this session (sql_id=6n9rq3vxmnr36) ----- INSERT INTO INTF_CRM_CODE@GZINTF (CODEID, CODE, STARTCODE, ENDCODE, CATEGORY, AREA_CODE, PROCESS_STATE, INS_DATE, EDIT_DATE, STAFF_CODE ) VALUES (:B7 , :B6 , :B5 , :B4 , :B3 , :B2 , 0, SYSDATE, SYSDATE, :B1 ) ----- PL/SQL Stack ----- ----- PL/SQL Call Stack ----- object line object handle number name 0x6add29468 173 procedure TELEMENT.CODE_PUBLISH_BAT 0x73d9c6898 3 anonymous block ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- skdstdst()+41 call kgdsdst() 000000000 ? 000000000 ? 7FFF3F418560 ? 7FFF3F418638 ? 7FFF3F41D0E0 ? 000000002 ? ksedst1()+103 call skdstdst() 000000000 ? 000000000 ? 7FFF3F418560 ? 7FFF3F418638 ? 7FFF3F41D0E0 ? 000000002 ? ksedst()+39 call ksedst1() 000000000 ? 000000001 ? 7FFF3F418560 ? 7FFF3F418638 ? 7FFF3F41D0E0 ? 000000002 ? dbkedDefDump()+2746 call ksedst() 000000000 ? 000000001 ? 7FFF3F418560 ? 7FFF3F418638 ? 7FFF3F41D0E0 ? 000000002 ? ksedmp()+41 call dbkedDefDump() 000000003 ? 000000002 ? 7FFF3F418560 ? 7FFF3F418638 ? 7FFF3F41D0E0 ? 000000002 ? ksfdmp()+69 call ksedmp() 000000003 ? 000000002 ? 7FFF3F418560 ? 7FFF3F418638 ? 7FFF3F41D0E0 ? 000000002 ? dbgexPhaseII()+1764 call ksfdmp() 000000003 ? 000000002 ? 7FFF3F418560 ? 7FFF3F418638 ? 7FFF3F41D0E0 ? 000000002 ? dbgexProcessError() call dbgexPhaseII() 7FF5719F2730 ? 7FF56E9E3898 ? +2680 7FFF3F421BB8 ? 7FFF3F418638 ? 7FFF3F41D0E0 ? 000000002 ? dbgeExecuteForError call dbgexProcessError() 7FF5719F2730 ? 7FF56E9E3898 ? ()+88 000000001 ? 000000000 ? 7FFF3F41D0E0 ? 000000002 ? dbgePostErrorKGE()+ call dbgeExecuteForError 7FF5719F2730 ? 7FF56E9E3898 ? 2136 () 000000001 ? 000000001 ? 000000000 ? 000000002 ? dbkePostKGE_kgsf()+ call dbgePostErrorKGE() 00C0B22C0 ? 7FF5719FFB00 ? 71 000000258 ? 000000001 ? 000000000 ? 000000002 ? kgesev()+280 call dbkePostKGE_kgsf() 00C0B22C0 ? 7FF5719FFB00 ? 000000258 ? 000000001 ? 000000000 ? 000000002 ? ksesec1()+170 call kgesev() 00C0B22C0 ? 7FF5719FFB00 ? 000000258 ? 000000001 ? 7FFF3F422760 ? 000000002 ? npierr()+878 call ksesec1() 00C0B22C0 ? 000000001 ? 00000004E ? 7FF56E68BDE0 ? 7FFF3F421CC0 ? 00000000A ? kpnerr()+240 call npierr() 7FF56E68A570 ? 000000258 ? 000000000 ? 00000005E ? 7FFF3F421CC0 ? 7FF56E68CE90 ? kpnpst()+276 call kpnerr() 7FF56E68A570 ? 000000258 ? 000000000 ? 00000005E ? 7FFF3F4256E8 ? 7FF56E68CE90 ? upirtrc()+1919 call kpnpst() 7FFF3F425860 ? 000000258 ? 000000005 ? 000000000 ? 7FFF3F4256E8 ? 7FF56E68CE90 ? kpurcsc()+98 call upirtrc() 7FFF3F425860 ? 000000258 ? 000000005 ? 000000000 ? 7FFF3F4256E8 ? 7FF56E88CC20 ? kpuexec()+10790 call kpurcsc() 7FFF3F425860 ? 000000258 ? 000000005 ? 000000000 ? 7FFF3F4256E8 ? 7FF56E88CC20 ? kpnexe()+1517 call kpuexec() 7FFF3F425860 ? 000000258 ? 7FFF3F4256E0 ? 000000000 ? 7FFF3F4256E8 ? 7FFF3F429438 ? opiexe()+32658 call kpnexe() 7FF56E883948 ? 7FF56E88CC20 ? 7FF56E698860 ? 7FF56E9A9950 ? 000000001 ? 400000000 ? opipls()+2164 call opiexe() 000000004 ? 000000005 ? 7FF56E698860 ? 7FF56E9A9950 ? 000000001 ? 400000000 ? opiodr()+917 call opipls() 000000066 ? 000000005 ? 7FF56E698860 ? 7FF56E9A9950 ? 000000001 ? 400000000 ? rpidrus()+211 call opiodr() 000000066 ? 000000007 ? 7FFF3F42C5E0 ? 7FF56E9A9950 ? 000000001 ? 400000000 ? skgmstack()+148 call rpidrus() 7FFF3F42BFB8 ? 000000007 ? 7FFF3F42C5E0 ? 7FF56E9A9950 ? 000000001 ? 400000000 ? rpiswu2()+690 call skgmstack() 7FFF3F42BF90 ? 00C0B1EE0 ? 00000F618 ? 009684840 ? 7FFF3F42BFB8 ? 400000000 ? rpidrv()+1327 call rpiswu2() 735D440A0 ? 000000056 ? 735D44124 ? 000000002 ? 7FFF3F42BFB8 ? 400000000 ? psddr0()+473 call rpidrv() 000000004 ? 000000066 ? 7FFF3F42C5E0 ? 000000038 ? 7FFF3F42BFB8 ? 400000000 ? psdnal()+457 call psddr0() 000000004 ? 000000066 ? 7FFF3F42C5E0 ? 000000030 ? 7FFF3F42BFB8 ? 400000000 ? pevm_EXECC()+314 call psdnal() 7FFF3F42DD40 ? 7FFF3F42DF50 ? 7FFF3F42C5E0 ? 7FF56E5C3C10 ? 671978B98 ? 400000000 ? pfrinstr_EXECC()+80 call pevm_EXECC() 7FF56E9D1AB0 ? 7FF56E6D66B8 ? 000000020 ? 7FF56E5C3C10 ? 671978B98 ? 400000000 ? pfrrun()+4191 call pfrinstr_EXECC() 7FF56E895E80 ? 6493730E2 ? 7FF56E895EF0 ? 7FF56E5C3C10 ? 671978B98 ? 600000020 ? plsql_run()+649 call pfrrun() 7FF56E895E80 ? 6493730E2 ? 7FF56E895EF0 ? 7FFF3F42DD40 ? 671978B98 ? 600000020 ? peicnt()+302 call plsql_run() 7FF56E895E80 ? 6493730E2 ? 7FF56E895EF0 ? 7FFF3F42DD40 ? 671978B98 ? 600000020 ? kkxexe()+525 call peicnt() 7FFF3F42DD40 ? 7FF56E895E80 ? 7FF56E895EF0 ? 7FFF3F42DD40 ? 671978B98 ? 000000000 ? opiexe()+18001 call kkxexe() 7FF56E5B8B90 ? 7FF56E895E80 ? 7FF56E895EF0 ? 7FFF3F42DD40 ? 671978B98 ? 000000000 ? kpoal8()+2118 call opiexe() 000000049 ? 000000003 ? 7FF56E895EF0 ? 7FFF3F42DD40 ? 671978B98 ? 000000000 ? opiodr()+917 call kpoal8() 00000005E ? 000000003 ? 7FF56E895EF0 ? 7FFF3F42DD40 ? 671978B98 ? 000000000 ? ttcpip()+2183 call opiodr() 00000005E ? 00000001C ? 7FFF3F432570 ? 7FFF3F42DD40 ? 671978B98 ? 000000000 ? opitsk()+1710 call ttcpip() 00C0CFF10 ? 009987830 ? 7FFF3F432570 ? 000000000 ? 7FFF3F431FC8 ? 7FFF3F43256C ? opiino()+969 call opitsk() 00C0CFF18 ? 000000000 ? 7FFF3F432570 ? 000000000 ? 7FFF3F431FC8 ? 7FFF3F43256C ? opiodr()+917 call opiino() 00000003C ? 000000004 ? 7FFF3F433D68 ? 000000000 ? 7FFF3F431FC8 ? 7FFF3F43256C ? opidrv()+570 call opiodr() 00000003C ? 000000004 ? 7FFF3F433D68 ? 000000000 ? 7FFF3F431FC8 ? 7FFF3F43256C ? sou2o()+103 call opidrv() 00000003C ? 000000004 ? 7FFF3F433D68 ? 000000000 ? 7FFF3F431FC8 ? 7FFF3F43256C ? opimai_real()+133 call sou2o() 7FFF3F433D40 ? 00000003C ? 000000004 ? 7FFF3F433D68 ? 7FFF3F431FC8 ? 7FFF3F43256C ? ssthrdmain()+265 call opimai_real() 000000002 ? 7FFF3F433F30 ? 000000004 ? 7FFF3F433D68 ? 7FFF3F431FC8 ? 7FFF3F43256C ? main()+201 call ssthrdmain() 000000002 ? 7FFF3F433F30 ? 000000001 ? 000000000 ? 7FFF3F431FC8 ? 7FFF3F43256C ? __libc_start_main() call main() 000000002 ? 7FFF3F4340D8 ? +245 000000001 ? 000000000 ? 7FFF3F431FC8 ? 7FFF3F43256C ? _start()+41 call __libc_start_main() 000A14EC0 ? 000000002 ? 7FFF3F4340C8 ? 000000000 ? 7FFF3F431FC8 ? 7FFF3F43256C ? --------------------- Binary Stack Dump --------------------- |
我们知道ORA-00600[4XXX]都是与undo息息相关的报错,而在call stack里并未找到ktu相关函数调用,所以怀疑是remote端数据库的undo异常导致的。
查看remote端,数据库为单实例10.2.0.4,从报错trace里可以看到同样的sql也是报的ORA-00600[4097]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [4097], [], [], [], [], [], [], [] Current SQL statement for this session: INSERT INTO "INTF_CRM_CODE" "A1" ("CODEID","CODE","STARTCODE","ENDCODE","CATEGORY","AREA_CODE","PROCESS_STATE","INS_DATE","EDIT_DATE","STAFF_CODE") VALUES (:B7,:B6,:B5,:B4,:B3,:B2,0,SYSDATE@!,SYSDATE@!,:B1) ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst()+31 call ksedst1() 000000000 ? 000000001 ? 7FFF81D3D320 ? 7FFF81D3D380 ? 7FFF81D3D2C0 ? 000000000 ? ksedmp()+610 call ksedst() 000000000 ? 000000001 ? 7FFF81D3D320 ? 7FFF81D3D380 ? 7FFF81D3D2C0 ? 000000000 ? ksfdmp()+21 call ksedmp() 000000003 ? 000000001 ? 7FFF81D3D320 ? 7FFF81D3D380 ? 7FFF81D3D2C0 ? 000000000 ? kgeriv()+176 call ksfdmp() 000000003 ? 000000001 ? 7FFF81D3D320 ? 7FFF81D3D380 ? 7FFF81D3D2C0 ? 000000000 ? kgesiv()+119 call kgeriv() 0068CE7C0 ? 00A516C20 ? 000000000 ? 000000000 ? 7FFF81D3D2C0 ? 000000000 ? ksesic0()+209 call kgesiv() 0068CE7C0 ? 00A516C20 ? 000001001 ? 000000000 ? 7FFF81D3E0A0 ? 000000000 ? ktugti()+3200 call ksesic0() 000001001 ? 0068CE940 ? 000000000 ? 00000009A ? 000000010 ? 101010101010101 ? ktbgfi()+3117 call ktugti() 1027BA02C ? 7FFF81D3E530 ? 000000000 ? 7FFF81D3E548 ? 000000000 ? 000000000 ? kdisdelete()+12464 call ktbgfi() 000000001 ? 7FFF81D3F8B4 ? 000000103 ? 000000000 ? 7FFF81D3F8B0 ? 1027BA014 ? kdisnew_bseg_srch_c call kdisdelete() 2B65E5197FE8 ? 7FFF81D40AE8 ? bk()+3175 000000003 ? 000000000 ? 203000176 ? 1027BA014 ? ktspfpblk()+395 call kdisnew_bseg_srch_c 7FFF81D40AE8 ? 7FFF81D40AE8 ? bk() 000000003 ? 000000000 ? 203000176 ? 1027BA014 ? ktspfsrch()+585 call ktspfpblk() 7FFF81D3FC90 ? 7FFF81D40AE8 ? 000000003 ? 000000000 ? 203000176 ? 1027BA014 ? ktspscan_bmb()+340 call ktspfsrch() 7FFF81D3FC90 ? 002407B0A ? 000000003 ? 000000000 ? 000000001 ? 000000000 ? ktspgsp_cbk1()+555 call ktspscan_bmb() 7FFF81D3FC90 ? 002407B0A ? 000000003 ? 000000000 ? 000000001 ? 000000000 ? ktspgsp_cbk()+133 call ktspgsp_cbk1() 000000000 ? 000000000 ? 000000000 ? 000000002 ? 7FFF81D40AE8 ? 000000000 ? kdisnew()+280 call ktspgsp_cbk() 000000000 ? 000000000 ? 000000000 ? 000000002 ? 7FFF81D40AE8 ? 000000000 ? kdisnewle()+129 call kdisnew() 2B65E5197FF0 ? 000000000 ? 7FFF81D40AE8 ? 000000002 ? 200000000 ? 000000000 ? kdisle()+4705 call kdisnewle() 2B65E5197FD8 ? 000000000 ? 7FFF81D40AE8 ? 7FFF81D40D40 ? 2B65E5148B40 ? 0000000A4 ? kdiins0()+27793 call kdisle() 1EE8FE6D8 ? 7FFF81D423E0 ? 7FFF81D42A40 ? 7FFF81D40D40 ? 81D51C5000000002 ? 7FFF81D51C50 ? kdiinsp()+106 call kdiins0() 1EE8FE6D8 ? 000000000 ? 000000000 ? 7FFF81D51C50 ? 000000000 ? 000000000 ? kauxsin()+1711 call kdiinsp() 1EE8FE6D8 ? 000000000 ? 000000000 ? 7FFF81D51C50 ? 000000000 ? 000000000 ? insidx()+963 call kauxsin() 1EE8FE740 ? 7FFF81D53470 ? 000000000 ? 2B65E517B2B8 ? 2B65E517B328 ? 2B65E517B348 ? insrowFastPath()+14 call insidx() 2B65E517B190 ? 2B65E517AC78 ? 7 1EE901A90 ? 2B65E517B2B8 ? 1EE8FE920 ? 2B65E517B348 ? insdrvFastPath()+76 call insrowFastPath() 2B65E517B190 ? 7FFF81D538E8 ? 5 1EE901A90 ? 2B65E517B2B8 ? 1EE8FE920 ? 2B65E517B348 ? inscovexe()+434 call insdrvFastPath() 2B65E517B190 ? 7FFF81D538E8 ? 1EE8FF530 ? 1EE901A90 ? 1EE8FE920 ? 2B65E517AC78 ? insExecStmtExecIniE call inscovexe() 1EE901A90 ? 1EE8FF530 ? ngine()+85 7FFF81D54B00 ? 2B65E517AC78 ? 1EE8FE920 ? 2B65E517AC78 ? insexe()+386 call insExecStmtExecIniE 1EE901A90 ? 1EE8FF530 ? ngine() 2B65E517AC78 ? 2B65E517AC78 ? 1EE8FE920 ? 2B65E517AC78 ? opiexe()+9270 call insexe() 1EE901058 ? 7FFF81D54B00 ? 1EE901A90 ? 2B65E517AC78 ? 1EE8FE920 ? 2B65E517AC78 ? kpoal8()+2289 call opiexe() 000000049 ? 000000003 ? 7FFF81D54FC0 ? 000000001 ? 1EE8FE920 ? 2B65E517AC78 ? opiodr()+984 call kpoal8() 00000005E ? 000000017 ? 7FFF81D57EA0 ? 000000001 ? 81D5000000000001 ? 2B65E517AC78 ? ttcpip()+1226 call opiodr() 00000005E ? 000000017 ? 7FFF81D57EA0 ? 000000000 ? 0059F5BF0 ? 2B65E517AC78 ? opitsk()+1322 call ttcpip() 0068D6490 ? 0052EA260 ? 7FFF81D57EA0 ? 000000000 ? 7FFF81D57998 ? 7FFF81D58008 ? opiino()+1026 call opitsk() 000000003 ? 000000000 ? 7FFF81D57EA0 ? 000000001 ? 000000000 ? 4E804ED00000001 ? opiodr()+984 call opiino() 00000003C ? 000000004 ? 7FFF81D59068 ? 000000000 ? 000000000 ? 4E804ED00000001 ? opidrv()+547 call opiodr() 00000003C ? 000000004 ? 7FFF81D59068 ? 000000000 ? 0059F56A0 ? 4E804ED00000001 ? sou2o()+114 call opidrv() 00000003C ? 000000004 ? 7FFF81D59068 ? 000000000 ? 0059F56A0 ? 4E804ED00000001 ? opimai_real()+163 call sou2o() 7FFF81D59040 ? 00000003C ? 000000004 ? 7FFF81D59068 ? 0059F56A0 ? 4E804ED00000001 ? main()+116 call opimai_real() 000000002 ? 7FFF81D590D0 ? 000000004 ? 7FFF81D59068 ? 0059F56A0 ? 4E804ED00000001 ? __libc_start_main() call main() 000000002 ? 7FFF81D590D0 ? +244 000000004 ? 7FFF81D59068 ? 0059F56A0 ? 4E804ED00000001 ? _start()+41 call __libc_start_main() 000723938 ? 000000002 ? 7FFF81D59228 ? 000000000 ? 0059F56A0 ? 000000002 ? --------------------- Binary Stack Dump --------------------- |
通常在trace的最后一个Block header dump极有可能是访问报错的block,所以搜索”Block header dump“先看看
1 2 3 4 5 6 7 8 9 |
Block header dump: 0x030001db Object id on Block? Y seg/obj: 0x5f48f csc: 0xfd4.83315956 itc: 2 flg: E typ: 2 - INDEX brn: 0 bdba: 0x300018a ver: 0x01 opc: 0 inc: 0 exflg: 0 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0008.027.00149736 0x03448015.d5b7.07 --U- 1 fsc 0x0000.83315957 0x02 0x00ec.00e.000916f3 0x0347cae6.cf8e.08 C-U- 0 scn 0x0f1f.1adb7509 |
报错的block是一个索引,dataobj#为 0x5f48f ,这明显是一个业务表上的索引,从ITL上的slot 0x01看出读取该块需要访问8号undo段头事务表的slot 27做块清除找到符合该次逻辑读的commit scn,那么看看8号undo段头事务表slot 27
1 2 3 4 5 |
index state cflags wrap# uel scn dba parent-xid nub stmt_num cmt ------------------------------------------------------------------------------------------------ ... 0x27 9 0x00 0x144b22 0x001e 0x0fe0.c8984e1b 0x03c8dd1f 0x0000.000.00000000 0x00000001 0x00000000 1637578830 ... |
可以看到索引块ITL中的XID的wrap#(0x149736)居然比事务表27 slot的wrap#要大,这明显是不可能的事情,所以报出了ORA-00600[4097]。
解决方法很简单就是重建undo表空间并删除老的undo表空间。 ORA-00600[4097] 一般在不一致的open数据库经常遇到,由于该环境为正常的生产环境,通常不会出现 ORA-00600[4097] ,所以个人怀疑有写丢失的可能性。