案例：troubleshooting ORA-00600: internal error code, arguments: [25015] when drop tablespace

本案例来自北区某客户，数据库版本信息为AIX 7.1 RAC 11.2.0.4，在删除一个2T的空间时，报错ORA-00600。

SQL> DROP TABLESPACE "PROD" INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS;
DROP TABLESPACE "PROD" INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [25015], [13], [9], [30], [], [],
[], [], [], [], [], []

SQL> DROP TABLESPACE "PROD" INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS;

DROP TABLESPACE "PROD" INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS

ERROR at line 1:

ORA-00600: internal error code, arguments: [25015], [13], [9], [30], [], [],

[], [], [], [], [], []

看到ORA-00600，通常先去逛一下mos，发现25015没有查到任何相关文档，google也找了一圈也没有找到，没办法只能靠自己了。

通常ORA-00600的第一个Argument都有特殊的含义，要么是函数，要么是数字。通过第一个参数都可以缩小排查范围，例如本例的25015。

[250XX]    kft.c        Kernel File management Tablespace component.

1	[250XX] kft.c Kernel File management Tablespace component.

这属于一个表空间相关的报错，本身就要drop tablespace，这次这个信息有点鸡肋。

那么后面的Argument是什么意思呢？需要猜测了，第一反应就是肯定有一个是TS#。

SQL> select ts# from ts$ where name='PROD';
TS#
----------
13

SQL> select ts# from ts$ where name='PROD';

TS#

----------

运气不错，第二个参数一查就是要删除的表空间号。9和30暂时猜不出是啥意思，这个时候需要用到10046了，10046是分析此类问题的最大利器。

=====================
PARSING IN CURSOR #4577923312 len=182 dep=1 uid=0 oct=6 lid=0 tim=64067949074704 hv=810151256 ad='700012556b7b4e0' sqlid='7xrhv80s4mvas'
update file$ set blocks=:2,ts#=DECODE(:3,-1,NULL,:3),status$=:4, relfile#=DECODE(:5,0,NULL,:5),maxextend=:6,inc=:7,crscnwrp=:8,crscnbas=:9,spare1=DECODE(:10,0,NULL,:10)where file#=:1
END OF STMT
...
...
...
=====================
PARSING IN CURSOR #4577978904 len=338 dep=1 uid=0 oct=6 lid=0 tim=64067949338948 hv=3318318274 ad='70001259ef40a90' sqlid='cpjw5cr2wm162'
update ts$ set name=:2,online$=:3,contents$=:4,undofile#=:5,undoblock#=:6, blocksize=:7,dflmaxext=:8,dflinit=:9,dflincr=:10,dflextpct=:11,dflminext=:12,dflminlen=:13,owner#=:14,scnwrp=:15,scnbas=:16,pitrscnwrp=:17,pitrscnbas=:18,dflogging=:19,bitmapped=:20,inc#=:21,flags=:22,plugged=:23,spare1=:24,spare2=:25,affstrength=:26 where ts#=:1
END OF STMT
...
...
...
WAIT #4576511768: nam='control file parallel write' ela= 283 files=2 block#=116 requests=2 obj#=-1 tim=64067949675947
WAIT #4576511768: nam='control file sequential read' ela= 271 file#=0 block#=53 blocks=1 obj#=-1 tim=64067949676236
WAIT #4576511768: nam='control file parallel write' ela= 303 files=2 block#=52 requests=2 obj#=-1 tim=64067949676595
WAIT #4576511768: nam='control file sequential read' ela= 175 file#=0 block#=116 blocks=1 obj#=-1 tim=64067949676787
WAIT #4576511768: nam='control file sequential read' ela= 273 file#=0 block#=268 blocks=1 obj#=-1 tim=64067949677095
WAIT #4576511768: nam='control file parallel write' ela= 229 files=2 block#=116 requests=2 obj#=-1 tim=64067949677378
WAIT #4576511768: nam='control file sequential read' ela= 215 file#=0 block#=52 blocks=1 obj#=-1 tim=64067949677609
WAIT #4576511768: nam='control file parallel write' ela= 303 files=2 block#=267 requests=2 obj#=-1 tim=64067949678230
WAIT #4576511768: nam='control file sequential read' ela= 213 file#=0 block#=53 blocks=1 obj#=-1 tim=64067949678461
WAIT #4576511768: nam='control file parallel write' ela= 210 files=2 block#=52 requests=2 obj#=-1 tim=64067949678735
WAIT #4576511768: nam='control file sequential read' ela= 228 file#=0 block#=267 blocks=1 obj#=-1 tim=64067949678980
Incident 200841 created, dump file: /u01/app/oracle/diag/rdbms/hxlis/hxlis2/incident/incdir_200841/hxlis2_ora_9633892_i200841.trc
ORA-00600: internal error code, arguments: [25015], [13], [9], [30], [], [], [], [], [], [], [], []

=====================

PARSING IN CURSOR #4577923312 len=182 dep=1 uid=0 oct=6 lid=0 tim=64067949074704 hv=810151256 ad='700012556b7b4e0' sqlid='7xrhv80s4mvas'

update file$ set blocks=:2,ts#=DECODE(:3,-1,NULL,:3),status$=:4, relfile#=DECODE(:5,0,NULL,:5),maxextend=:6,inc=:7,crscnwrp=:8,crscnbas=:9,spare1=DECODE(:10,0,NULL,:10)where file#=:1

END OF STMT

...

=====================

PARSING IN CURSOR #4577978904 len=338 dep=1 uid=0 oct=6 lid=0 tim=64067949338948 hv=3318318274 ad='70001259ef40a90' sqlid='cpjw5cr2wm162'

update ts$ set name=:2,online$=:3,contents$=:4,undofile#=:5,undoblock#=:6, blocksize=:7,dflmaxext=:8,dflinit=:9,dflincr=:10,dflextpct=:11,dflminext=:12,dflminlen=:13,owner#=:14,scnwrp=:15,scnbas=:16,pitrscnwrp=:17,pitrscnbas=:18,dflogging=:19,bitmapped=:20,inc#=:21,flags=:22,plugged=:23,spare1=:24,spare2=:25,affstrength=:26 where ts#=:1

END OF STMT

...

WAIT #4576511768: nam='control file parallel write' ela= 283 files=2 block#=116 requests=2 obj#=-1 tim=64067949675947

WAIT #4576511768: nam='control file sequential read' ela= 271 file#=0 block#=53 blocks=1 obj#=-1 tim=64067949676236

WAIT #4576511768: nam='control file parallel write' ela= 303 files=2 block#=52 requests=2 obj#=-1 tim=64067949676595

WAIT #4576511768: nam='control file sequential read' ela= 175 file#=0 block#=116 blocks=1 obj#=-1 tim=64067949676787

WAIT #4576511768: nam='control file sequential read' ela= 273 file#=0 block#=268 blocks=1 obj#=-1 tim=64067949677095

WAIT #4576511768: nam='control file parallel write' ela= 229 files=2 block#=116 requests=2 obj#=-1 tim=64067949677378

WAIT #4576511768: nam='control file sequential read' ela= 215 file#=0 block#=52 blocks=1 obj#=-1 tim=64067949677609

WAIT #4576511768: nam='control file parallel write' ela= 303 files=2 block#=267 requests=2 obj#=-1 tim=64067949678230

WAIT #4576511768: nam='control file sequential read' ela= 213 file#=0 block#=53 blocks=1 obj#=-1 tim=64067949678461

WAIT #4576511768: nam='control file parallel write' ela= 210 files=2 block#=52 requests=2 obj#=-1 tim=64067949678735

WAIT #4576511768: nam='control file sequential read' ela= 228 file#=0 block#=267 blocks=1 obj#=-1 tim=64067949678980

Incident 200841 created, dump file: /u01/app/oracle/diag/rdbms/hxlis/hxlis2/incident/incdir_200841/hxlis2_ora_9633892_i200841.trc

ORA-00600: internal error code, arguments: [25015], [13], [9], [30], [], [], [], [], [], [], [], []

可以看到在连续的两个update file$和ts$后，开始读写控制文件之后报出了ORA-00600: internal error code, arguments: [25015], [13], [9], [30], [], [], [], [], [], [], [], []。这里猜测会不会是控制文件记录和数据字典记录不一致了呢？

SQL> select count(*) from file$ where ts#=13;
COUNT(*)
----------
73
SQL> select count(*) from v$datafile where ts#=13;
COUNT(*)
----------
74
SQL> select file# from v$datafile where ts#=13 minus select file# from file$ where ts#=13;
FILE#
----------
30

SQL> select count(*) from file$ where ts#=13;

COUNT(*)

----------

SQL> select count(*) from v$datafile where ts#=13;

COUNT(*)

----------

SQL> select file# from v$datafile where ts#=13 minus select file# from file$ where ts#=13;

FILE#

----------

发现果然不一致，正好file$少了一条记录并且file#是30，那么该600错误的第四个参数30的含义应该就是文件号，应该是有人去手动delete了file$删除了file$的记录。

知道原因的话处理起来就舒服多了，一般思路有四种。

闪回查询：通过闪回查询把删除的记录找回来，很不幸ORA-01555出现了。
logminer：通过logminer找回delete file$的数据，但是并不知道是啥时候删除的。
bbed：通过修改行头flag标识撤销delete操作，dump了block发现delete的行已经被覆盖了，因为客户又新建了100多个数据文件。
构造file#=30的记录插入file$，看来也只能用这种方法了。

create table file$                                             /* file table */
( file#         number not null,                   /* file identifier number */
status$       number not null,                      /* status (see KTS.H): */
/* 1 = INVALID, 2 = AVAILABLE */
blocks        number not null,                   /* size of file in blocks */
/* zero for bitmapped tablespaces */
ts#           number,                         /* tablespace that owns file */
relfile#      number,                              /* relative file number */
maxextend     number,                                 /* maximum file size */
inc           number,                                  /* increment amount */
crscnwrp      number,                                 /* creation SCN wrap */
crscnbas      number,                                 /* creation SCN base */
ownerinstance varchar("M_IDEN"),                    /* Owner instance name */
spare1        number,      /* tablespace-relative DBA of space file header */
/* NULL for dictionary-mapped tablespaces */
spare2        number,
spare3        varchar2(1000),
spare4        date
)

create table file$ /* file table */

( file# number not null, /* file identifier number */

status$ number not null, /* status (see KTS.H): */

/* 1 = INVALID, 2 = AVAILABLE */

blocks number not null, /* size of file in blocks */

/* zero for bitmapped tablespaces */

ts# number, /* tablespace that owns file */

relfile# number, /* relative file number */

maxextend number, /* maximum file size */

inc number, /* increment amount */

crscnwrp number, /* creation SCN wrap */

crscnbas number, /* creation SCN base */

ownerinstance varchar("M_IDEN"), /* Owner instance name */

spare1 number, /* tablespace-relative DBA of space file header */

/* NULL for dictionary-mapped tablespaces */

spare2 number,

spare3 varchar2(1000),

spare4 date

)

看了一下file$的定义，构造一条记录太简单了，找一个同表空间的数据文件记录，修改一下file#、relfile#、crscnwrp、crscnbas即可，create scn在控制文件中也有记录，所以可以轻松的构造一条file#=30的记录插入到file$中。

操作步骤大致如下：

SYS@hxlis1 >create table t as select * from file$ where file#=29;
Table created.
SQL> select file#,creation_change# from v$datafile where file#=30;
FILE#     CREATION_CHANGE#
-------------------- --------------------
30       11816753253752
SYS@hxlis1 >select to_number('abf','xxxxxxxxxxxxxxxxxxxx'),to_number('4d614d78','xxxxxxxxxxxxxxxxxxxxxx') from dual;
TO_NUMBER('ABF','XXXXXXXXXXXXXXXXXXXX')
---------------------------------------
TO_NUMBER('4D614D78','XXXXXXXXXXXXXXXXXXXXXX')
----------------------------------------------
2751
1298222456
SQL> update  t set file#=30,relfile#=30,CRSCNWRP=2751,CRSCNBAS=1298220064;
1 row updated.
SQL> commit;
Commit complete.
SQL> insert into file$ select * from t;
1 row created.
SQL> commit;
Commit complete.

SYS@hxlis1 >create table t as select * from file$ where file#=29;

Table created.

SQL> select file#,creation_change# from v$datafile where file#=30;

FILE# CREATION_CHANGE#

-------------------- --------------------

30 11816753253752

SYS@hxlis1 >select to_number('abf','xxxxxxxxxxxxxxxxxxxx'),to_number('4d614d78','xxxxxxxxxxxxxxxxxxxxxx') from dual;

TO_NUMBER('ABF','XXXXXXXXXXXXXXXXXXXX')

---------------------------------------

TO_NUMBER('4D614D78','XXXXXXXXXXXXXXXXXXXXXX')

----------------------------------------------

2751

1298222456

SQL> update t set file#=30,relfile#=30,CRSCNWRP=2751,CRSCNBAS=1298220064;

1 row updated.

SQL> commit;

Commit complete.

SQL> insert into file$ select * from t;

1 row created.

SQL> commit;

Commit complete.

修改完成之后，为了保险起见，手动刷新两个节点的shared pool和db cache。

SYS@hxlis1 >DROP TABLESPACE "PROD" INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS;
Tablespace dropped.

SYS@hxlis1 >DROP TABLESPACE "PROD" INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS;

Tablespace dropped.

再次尝试删除成功。

强烈建议在不熟悉的情况下，不要轻易修改oracle的字典基表。本案例就是由于人为delete了file$记录引发的。

案例：troubleshooting ORA-00600: internal error code, arguments: [25015] when drop tablespace

发表回复取消回复

近期文章

近期评论

案例：troubleshooting ORA-00600: internal error code, arguments: [25015] when drop tablespace

发表回复 取消回复

近期文章

近期评论

发表回复取消回复