案例：一则非常巧合的ORA-15042恢复

今天跟大家分享一个ASM磁盘组损坏的案例，此案例来至于一个网友。由于是保密客户，不能拿到数据，所以这里只是在自己的环境中模拟此现象并给出解决方案。

从Oracle 10g中，Oracle推出ASM（自动存储管理）功能，用于替换基于主机的存储管理软件，使得Oracle Rac运行不在依赖于第三方的存储管理软件（如hacmp，sfrac）。在10G中，ASM的功能和稳定性就还不完善，并没有大规模的被使用。但是在11G版本中,ASM已经大规模被使用，瞬间成为集群的核心存储管理解决方案。同时ASM这个黑匣子也逐渐的被大家认识，今天我们就给大家分享一个ASM磁盘组不能挂载的案例。

ASM磁盘组有三种冗余方式：External Redundancy、Normal Redundancy、High Redundancy。其中的冗余机制这里就不过多介绍了，拥有冗余的磁盘组就可以高枕无忧了吗？肯定不是，冗余的机制只能保证部分故障的解决，还不足以让我们高枕无忧，就如我们今天的分享案例，哪怕你有normal的冗余方式，也只能事出无奈、束手无策，特别是在一些OLAP系统中，几十T的数据很正常，当故障来临时，哪怕通过备份来还原，那么这个时间也是无法容忍的。唯有对ASM的原理足够的了解，才能让我们在故障时，通过一些非常规的手段修复。

ASM跟普通文件系统一样，有自己的元数据，并且ASM的元数据库是可以直接通过KFED来查看和编辑的，今天我们用到的就是PST元数据，下面我们简单描述一下：

PST对于ASM非常重要，在读取其他ASM metadata之前会先检查PST,当mount diskgroup时，GMON进程会读取diskgroup中所有磁盘去找到和确认PST，通过PST可以确认哪些磁盘是可以ONLINE使用的,哪些是OFFLINE的。
PST位于磁盘的第一个au上，但并不是每块磁盘上都有PST。磁盘组镜像的不同，PST的个数也不同，如下：

External Redundancy一般有一个PST
Normal Redundancy至多有个3个PST
High Redundancy 至多有5个PST

下面有请我们今天的主角出场：

NORMAL磁盘组中有1个failgroup意外offline(如现在市面上的一体机1个存储节点意外重启)，在这个failgroup恢复回来重新成功online之前，另外一个failgroup中有一块磁盘损坏了，此时悲剧就发生了，即使被offline的failgroup还原回来，也不能mount磁盘组。因为我们之前介绍的ASM重要元数据PST认为这些盘的状态不是可正常访问的。

1，构建一个NORMAL冗余的磁盘组，有3个failgroup，每个fg有2块盘：

SQL&gt; select GRPNUM_KFDSK,NUMBER_KFDSK,MODE_KFDSK,FAILNAME_KFDSK,PATH_KFDSK from x$kfdsk where GRPNUM_KFDSK=2;
GRPNUM_KFDSK NUMBER_KFDSK MODE_KFDSK FAILNAME_KFDSK       PATH_KFDSK
------------ ------------ ---------- -------------------- --------------------
2            0        127 FG2                  /dev/asm-test-diske
2            1        127 FG2                  /dev/asm-test-diskf
2            2        127 FG1                  /dev/asm-test-diskc
2            3        127 FG1                  /dev/asm-test-diskd
2            4        127 FG3                  /dev/asm-test-diskg
2            5        127 FG3                  /dev/asm-test-diskh

SQL> select GRPNUM_KFDSK,NUMBER_KFDSK,MODE_KFDSK,FAILNAME_KFDSK,PATH_KFDSK from x$kfdsk where GRPNUM_KFDSK=2;

GRPNUM_KFDSK NUMBER_KFDSK MODE_KFDSK FAILNAME_KFDSK PATH_KFDSK

------------ ------------ ---------- -------------------- --------------------

2 0 127 FG2 /dev/asm-test-diske

2 1 127 FG2 /dev/asm-test-diskf

2 2 127 FG1 /dev/asm-test-diskc

2 3 127 FG1 /dev/asm-test-diskd

2 4 127 FG3 /dev/asm-test-diskg

2 5 127 FG3 /dev/asm-test-diskh

为了便于观察恢复效果，跟踪某条记录的变化，在offline primary extent所在磁盘后，更新这条数据，然后破坏其secondary extent所在磁盘，最后验证该事务是否丢失。这里手动创建一张rescureora的测试表，并查看其中一行记录物理存放位置。

SQL&gt; select object_id,object_name,
2         dbms_rowid.rowid_block_number(rowid) block#,
3         dbms_rowid.rowid_relative_fno(rowid) file#
4    from sys.rescureora where rownum=1;
OBJECT_ID OBJECT_NAME                                  BLOCK#      FILE#
---------- ---------------------------------------- ---------- ----------
20 ICOL$                                           131          5

SQL> select object_id,object_name,

2 dbms_rowid.rowid_block_number(rowid) block#,

3 dbms_rowid.rowid_relative_fno(rowid) file#

4 from sys.rescureora where rownum=1;

OBJECT_ID OBJECT_NAME BLOCK# FILE#

---------- ---------------------------------------- ---------- ----------

20 ICOL$ 131 5

通过脚本找到数据块与ASM磁盘的映射关系，由于是normal冗余，此处会看到两副本，LXN_KFFXP为0的是primary extent在1号disk上，为1的是secondary extent在4号disk上，稍后我们就模拟offline 1号disk所在fg，并且破坏4号盘。

SQL&gt; @asm_block
Enter value for block: 131
Enter value for file_number: 256
Enter value for file_type: DATAFILE
Enter value for filename: TEST.256.1034246527
GROUP_KFFXP  LXN_KFFXP   AU_KFFXP DISK_KFFXP  PXN_KFFXP
----------- ---------- ---------- ---------- ----------
2          1         24          4          3
2          0         30          1          2

SQL> @asm_block

Enter value for block: 131

Enter value for file_number: 256

Enter value for file_type: DATAFILE

Enter value for filename: TEST.256.1034246527

GROUP_KFFXP LXN_KFFXP AU_KFFXP DISK_KFFXP PXN_KFFXP

----------- ---------- ---------- ---------- ----------

2 1 24 4 3

2 0 30 1 2

2，通过GMON的日志文件来分析PST位置
从gmon trace可以发现，该磁盘组PST在0、2、4号磁盘上。通过kfed也可以验证：

=============== PST ====================
grpNum:    2
state:     1
callCnt:   25
(lockvalue) valid=1 ver=0.0 ndisks=3 flags=0x3 from inst=0 (I am 1) last=0
--------------- HDR --------------------
next:    29
last:    29
pst count:       3  --pst个数
pst locations:   4  2  0  --pst分布
incarn:          25
dta size:        6
version:         1
ASM version:     186646528 = 11.2.0.0.0
contenttype:     1
partnering pattern:      [ ]
--------------- LOC MAP ----------------
0: dirty 0       cur_loc: 0      stable_loc: 0
1: dirty 0       cur_loc: 0      stable_loc: 0
--------------- DTA --------------------
0: sts v v(rw) p(rw) a(x) d(x) fg# = 1 addTs = 2429200834 parts: 5 (amp) 4 (amp) 3 (amp) 2 (amp)
1: sts v v(rw) p(rw) a(x) d(x) fg# = 1 addTs = 2429200834 parts: 4 (amp) 5 (amp) 2 (amp) 3 (amp)
2: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 1 (amp) 4 (amp) 5 (amp) 0 (amp)
3: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 5 (amp) 0 (amp) 1 (amp) 4 (amp)
4: sts v v(rw) p(rw) a(x) d(x) fg# = 3 addTs = 2429203972 parts: 1 (amp) 0 (amp) 2 (amp) 3 (amp)
5: sts v v(rw) p(rw) a(x) d(x) fg# = 3 addTs = 2429203972 parts: 0 (amp) 1 (amp) 3 (amp) 2 (amp)

=============== PST ====================

grpNum: 2

state: 1

callCnt: 25

(lockvalue) valid=1 ver=0.0 ndisks=3 flags=0x3 from inst=0 (I am 1) last=0

--------------- HDR --------------------

next: 29

last: 29

pst count: 3 --pst个数

pst locations: 4 2 0 --pst分布

incarn: 25

dta size: 6

version: 1

ASM version: 186646528 = 11.2.0.0.0

contenttype: 1

partnering pattern: [ ]

--------------- LOC MAP ----------------

0: dirty 0 cur_loc: 0 stable_loc: 0

1: dirty 0 cur_loc: 0 stable_loc: 0

--------------- DTA --------------------

0: sts v v(rw) p(rw) a(x) d(x) fg# = 1 addTs = 2429200834 parts: 5 (amp) 4 (amp) 3 (amp) 2 (amp)

1: sts v v(rw) p(rw) a(x) d(x) fg# = 1 addTs = 2429200834 parts: 4 (amp) 5 (amp) 2 (amp) 3 (amp)

2: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 1 (amp) 4 (amp) 5 (amp) 0 (amp)

3: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 5 (amp) 0 (amp) 1 (amp) 4 (amp)

4: sts v v(rw) p(rw) a(x) d(x) fg# = 3 addTs = 2429203972 parts: 1 (amp) 0 (amp) 2 (amp) 3 (amp)

5: sts v v(rw) p(rw) a(x) d(x) fg# = 3 addTs = 2429203972 parts: 0 (amp) 1 (amp) 3 (amp) 2 (amp)

[grid@rescureora1 trace]$ kfed read /dev/asm-test-diske aun=1 blkn=1|more
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                           17 ; 0x002: KFBTYP_PST_META
kfbh.datfmt:                          2 ; 0x003: 0x02
kfbh.block.blk:                     257 ; 0x004: blk=257
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                   837788407 ; 0x00c: 0x31efa2f7
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdpHdrPairBv1.first.super.time.hi:33098987 ; 0x000: HOUR=0xb DAYS=0x7 MNTH=0x3 YEAR=0x7e4
kfdpHdrPairBv1.first.super.time.lo:3089312768 ; 0x004: USEC=0x0 MSEC=0xcb SECS=0x2 MINS=0x2e
kfdpHdrPairBv1.first.super.last:     28 ; 0x008: 0x0000001c
kfdpHdrPairBv1.first.super.next:     29 ; 0x00c: 0x0000001d
kfdpHdrPairBv1.first.super.copyCnt:   3 ; 0x010: 0x03   --PST有3个副本,分别在0、2、4号disk上
kfdpHdrPairBv1.first.super.version:   1 ; 0x011: 0x01
kfdpHdrPairBv1.first.super.ub2spare:  0 ; 0x012: 0x0000
kfdpHdrPairBv1.first.super.incarn:   25 ; 0x014: 0x00000019
kfdpHdrPairBv1.first.super.copy[0]:   4 ; 0x018: 0x0004 -4号disk
kfdpHdrPairBv1.first.super.copy[1]:   2 ; 0x01a: 0x0002 -2号disk
kfdpHdrPairBv1.first.super.copy[2]:   0 ; 0x01c: 0x0000 -0号disk
kfdpHdrPairBv1.first.super.copy[3]:   0 ; 0x01e: 0x0000
kfdpHdrPairBv1.first.super.copy[4]:   0 ; 0x020: 0x0000

[grid@rescureora1 trace]$ kfed read /dev/asm-test-diske aun=1 blkn=1|more

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 17 ; 0x002: KFBTYP_PST_META

kfbh.datfmt: 2 ; 0x003: 0x02

kfbh.block.blk: 257 ; 0x004: blk=257

kfbh.block.obj: 2147483648 ; 0x008: disk=0

kfbh.check: 837788407 ; 0x00c: 0x31efa2f7

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdpHdrPairBv1.first.super.time.hi:33098987 ; 0x000: HOUR=0xb DAYS=0x7 MNTH=0x3 YEAR=0x7e4

kfdpHdrPairBv1.first.super.time.lo:3089312768 ; 0x004: USEC=0x0 MSEC=0xcb SECS=0x2 MINS=0x2e

kfdpHdrPairBv1.first.super.last: 28 ; 0x008: 0x0000001c

kfdpHdrPairBv1.first.super.next: 29 ; 0x00c: 0x0000001d

kfdpHdrPairBv1.first.super.copyCnt: 3 ; 0x010: 0x03 --PST有3个副本,分别在0、2、4号disk上

kfdpHdrPairBv1.first.super.version: 1 ; 0x011: 0x01

kfdpHdrPairBv1.first.super.ub2spare: 0 ; 0x012: 0x0000

kfdpHdrPairBv1.first.super.incarn: 25 ; 0x014: 0x00000019

kfdpHdrPairBv1.first.super.copy[0]: 4 ; 0x018: 0x0004 -4号disk

kfdpHdrPairBv1.first.super.copy[1]: 2 ; 0x01a: 0x0002 -2号disk

kfdpHdrPairBv1.first.super.copy[2]: 0 ; 0x01c: 0x0000 -0号disk

kfdpHdrPairBv1.first.super.copy[3]: 0 ; 0x01e: 0x0000

kfdpHdrPairBv1.first.super.copy[4]: 0 ; 0x020: 0x0000

3，模拟故障现场

3.1 offline fg2，fg2为primary extent最在的failgroup，此时手动offline，模拟生产环境的存储节点服务器关机。

SQL&gt; ALTER DISKGROUP TEST offline disks in failgroup fg2;
Diskgroup altered.

SQL> ALTER DISKGROUP TEST offline disks in failgroup fg2;

Diskgroup altered.

此时gmon日志中，会生成最新的PST信息，如下：

GMON updating disk modes for group 2 at 27 for pid 26, osid 3324
dsk = 0/0xe96887d4, mask = 0x7e, op = clear
dsk = 1/0xe96887d7, mask = 0x7e, op = clear
=============== PST ====================
grpNum:    2
state:     1
callCnt:   27
(lockvalue) valid=1 ver=0.0 ndisks=2 flags=0x3 from inst=0 (I am 1) last=0
--------------- HDR --------------------
next:    31
last:    31
pst count:       2   --此时pst只有2个了
pst locations:   4  2
incarn:          30
dta size:        6
version:         1
ASM version:     186646528 = 11.2.0.0.0
contenttype:     1
partnering pattern:      [ ]
--------------- LOC MAP ----------------
0: dirty 0       cur_loc: 0      stable_loc: 0
1: dirty 0       cur_loc: 0      stable_loc: 0
--------------- DTA --------------------
0: sts v v(--) p(--) a(-) d(-) fg# = 1 addTs = 2429200834 parts: 5 (amp) 4 (amp) 3 (amp) 2 (amp)  --此处可以看到0号盘已经offline
1: sts v v(--) p(--) a(-) d(-) fg# = 1 addTs = 2429200834 parts: 4 (amp) 5 (amp) 2 (amp) 3 (amp)  --此处可以看到1号盘已经offline
2: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 1 (amp) 4 (amp) 5 (amp) 0 (amp)
3: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 5 (amp) 0 (amp) 1 (amp) 4 (amp)
4: sts v v(rw) p(rw) a(x) d(x) fg# = 3 addTs = 2429203972 parts: 1 (amp) 0 (amp) 2 (amp) 3 (amp)
5: sts v v(rw) p(rw) a(x) d(x) fg# = 3 addTs = 2429203972 parts: 0 (amp) 1 (amp) 3 (amp) 2 (amp)

GMON updating disk modes for group 2 at 27 for pid 26, osid 3324

dsk = 0/0xe96887d4, mask = 0x7e, op = clear

dsk = 1/0xe96887d7, mask = 0x7e, op = clear

=============== PST ====================

grpNum: 2

state: 1

callCnt: 27

(lockvalue) valid=1 ver=0.0 ndisks=2 flags=0x3 from inst=0 (I am 1) last=0

--------------- HDR --------------------

next: 31

last: 31

pst count: 2 --此时pst只有2个了

pst locations: 4 2

incarn: 30

dta size: 6

version: 1

ASM version: 186646528 = 11.2.0.0.0

contenttype: 1

partnering pattern: [ ]

--------------- LOC MAP ----------------

0: dirty 0 cur_loc: 0 stable_loc: 0

1: dirty 0 cur_loc: 0 stable_loc: 0

--------------- DTA --------------------

0: sts v v(--) p(--) a(-) d(-) fg# = 1 addTs = 2429200834 parts: 5 (amp) 4 (amp) 3 (amp) 2 (amp) --此处可以看到0号盘已经offline

1: sts v v(--) p(--) a(-) d(-) fg# = 1 addTs = 2429200834 parts: 4 (amp) 5 (amp) 2 (amp) 3 (amp) --此处可以看到1号盘已经offline

2: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 1 (amp) 4 (amp) 5 (amp) 0 (amp)

3: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 5 (amp) 0 (amp) 1 (amp) 4 (amp)

4: sts v v(rw) p(rw) a(x) d(x) fg# = 3 addTs = 2429203972 parts: 1 (amp) 0 (amp) 2 (amp) 3 (amp)

5: sts v v(rw) p(rw) a(x) d(x) fg# = 3 addTs = 2429203972 parts: 0 (amp) 1 (amp) 3 (amp) 2 (amp)

3.2 更新数据,此处更新数据只是为了最后验证数据的有效性。

SQL&gt; update sys.rescureora set object_name='rescureora' where rownum=1;
1 row updated.
SQL&gt; commit;
Commit complete.
SQL&gt; select object_id,object_name,
2         dbms_rowid.rowid_block_number(rowid) block#,
3         dbms_rowid.rowid_relative_fno(rowid) file#
4    from sys.rescureora where rownum=1;
OBJECT_ID OBJECT_NAME                                  BLOCK#      FILE#
---------- ---------------------------------------- ---------- ----------
20 rescureora                                    131          5
SQL&gt; alter system checkpoint;
System altered.

SQL> update sys.rescureora set object_name='rescureora' where rownum=1;

1 row updated.

SQL> commit;

Commit complete.

SQL> select object_id,object_name,

2 dbms_rowid.rowid_block_number(rowid) block#,

3 dbms_rowid.rowid_relative_fno(rowid) file#

4 from sys.rescureora where rownum=1;

OBJECT_ID OBJECT_NAME BLOCK# FILE#

---------- ---------------------------------------- ---------- ----------

20 rescureora 131 5

SQL> alter system checkpoint;

System altered.

3.3 手动破坏4号磁盘,这里采用的dd命令，如果在12C中开启afd后，dd命令会自动过滤。

[grid@rescureora1 trace]$ dd if=/dev/zero of=/dev/asm-test-diskg bs=4096 count=1 conv=notrunc
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000306284 s, 13.4 MB/s

[grid@rescureora1 trace]$ dd if=/dev/zero of=/dev/asm-test-diskg bs=4096 count=1 conv=notrunc

1+0 records in

1+0 records out

4096 bytes (4.1 kB) copied, 0.000306284 s, 13.4 MB/s

3.4 故障出现,磁盘组crash，即使另外一个fg恢复回来(刚刚异常关闭的存储节点启动)

SQL&gt; alter diskgroup test mount;
alter diskgroup test mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "4" is missing from group number "2"
SQL&gt; alter diskgroup test mount force;
alter diskgroup test mount force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk "4" in group "TEST" may result in a data loss
ORA-15042: ASM disk "4" is missing from group number "2"

SQL> alter diskgroup test mount;

alter diskgroup test mount

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15040: diskgroup is incomplete

ORA-15042: ASM disk "4" is missing from group number "2"

SQL> alter diskgroup test mount force;

alter diskgroup test mount force

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15040: diskgroup is incomplete

ORA-15066: offlining disk "4" in group "TEST" may result in a data loss

ORA-15042: ASM disk "4" is missing from group number "2"

通过gmon trace观察此时的PST分布情况：

=============== PST ====================
grpNum:    2
state:     2
callCnt:   39
(lockvalue) valid=1 ver=0.0 ndisks=2 flags=0x3 from inst=0 (I am 1) last=0
--------------- HDR --------------------
next:    35
last:    35
pst count:       2
pst locations:   2  5  --2和5号disk
incarn:          34
dta size:        6
version:         1
ASM version:     186646528 = 11.2.0.0.0
contenttype:     1
partnering pattern:      [ ]
--------------- LOC MAP ----------------
0: dirty 0       cur_loc: 0      stable_loc: 0
1: dirty 1       cur_loc: 0      stable_loc: 0
--------------- DTA --------------------
0: sts v v(--) p(--) a(-) d(-) fg# = 1 addTs = 2429200834 parts: 5 (amp) 4 (amp) 3 (amp) 2 (amp)
1: sts v v(--) p(--) a(-) d(-) fg# = 1 addTs = 2429200834 parts: 4 (amp) 5 (amp) 2 (amp) 3 (amp)
2: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 1 (amp) 4 (amp) 5 (amp) 0 (amp)
3: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 5 (amp) 0 (amp) 1 (amp) 4 (amp)
4: sts v v(-w) p(-w) a(-) d(-) fg# = 3 addTs = 2429203972 parts: 1 (amp) 0 (amp) 2 (amp) 3 (amp)
5: sts v v(rw) p(rw) a(x) d(x) fg# = 3 addTs = 2429203972 parts: 0 (amp) 1 (amp) 3 (amp) 2 (amp)

=============== PST ====================

grpNum: 2

state: 2

callCnt: 39

(lockvalue) valid=1 ver=0.0 ndisks=2 flags=0x3 from inst=0 (I am 1) last=0

--------------- HDR --------------------

next: 35

last: 35

pst count: 2

pst locations: 2 5 --2和5号disk

incarn: 34

dta size: 6

version: 1

ASM version: 186646528 = 11.2.0.0.0

contenttype: 1

partnering pattern: [ ]

--------------- LOC MAP ----------------

0: dirty 0 cur_loc: 0 stable_loc: 0

1: dirty 1 cur_loc: 0 stable_loc: 0

--------------- DTA --------------------

0: sts v v(--) p(--) a(-) d(-) fg# = 1 addTs = 2429200834 parts: 5 (amp) 4 (amp) 3 (amp) 2 (amp)

1: sts v v(--) p(--) a(-) d(-) fg# = 1 addTs = 2429200834 parts: 4 (amp) 5 (amp) 2 (amp) 3 (amp)

2: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 1 (amp) 4 (amp) 5 (amp) 0 (amp)

3: sts v v(rw) p(rw) a(x) d(x) fg# = 2 addTs = 2429386451 parts: 5 (amp) 0 (amp) 1 (amp) 4 (amp)

4: sts v v(-w) p(-w) a(-) d(-) fg# = 3 addTs = 2429203972 parts: 1 (amp) 0 (amp) 2 (amp) 3 (amp)

5: sts v v(rw) p(rw) a(x) d(x) fg# = 3 addTs = 2429203972 parts: 0 (amp) 1 (amp) 3 (amp) 2 (amp)

4，解决方案

4.1 查看PST中的磁盘状态

[grid@rescureora1 ~]$ kfed read /dev/asm-test-diskc aun=1 blkn=2|grep "status"|grep -v "I=0"
kfdpDtaEv1[0].status:               21 ; 0x000: I=1 V=0 V=1 P=0 P=1 A=0 D=0
kfdpDtaEv1[1].status:               21 ; 0x030: I=1 V=0 V=1 P=0 P=1 A=0 D=0
kfdpDtaEv1[2].status:               127 ; 0x060: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[3].status:               127 ; 0x090: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[4].status:               127 ; 0x0c0: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[5].status:               127 ; 0x0f0: I=1 V=1 V=1 P=1 P=1 A=1 D=1
[grid@rescureora1 ~]$ kfed read /dev/asm-test-diskh aun=1 blkn=2|grep "status"|grep -v "I=0"
kfdpDtaEv1[0].status:               21 ; 0x000: I=1 V=0 V=1 P=0 P=1 A=0 D=0
kfdpDtaEv1[1].status:               21 ; 0x030: I=1 V=0 V=1 P=0 P=1 A=0 D=0
kfdpDtaEv1[2].status:               127 ; 0x060: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[3].status:               127 ; 0x090: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[4].status:               127 ; 0x0c0: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[5].status:               127 ; 0x0f0: I=1 V=1 V=1 P=1 P=1 A=1 D=1
[grid@rescureora1 ~]$ kfed read /dev/asm-test-diskc aun=1 blkn=2|grep "status"|grep -v "I=0" &gt; repair.txt

[grid@rescureora1 ~]$ kfed read /dev/asm-test-diskc aun=1 blkn=2|grep "status"|grep -v "I=0"

kfdpDtaEv1[0].status: 21 ; 0x000: I=1 V=0 V=1 P=0 P=1 A=0 D=0

kfdpDtaEv1[1].status: 21 ; 0x030: I=1 V=0 V=1 P=0 P=1 A=0 D=0

kfdpDtaEv1[2].status: 127 ; 0x060: I=1 V=1 V=1 P=1 P=1 A=1 D=1

kfdpDtaEv1[3].status: 127 ; 0x090: I=1 V=1 V=1 P=1 P=1 A=1 D=1

kfdpDtaEv1[4].status: 127 ; 0x0c0: I=1 V=1 V=1 P=1 P=1 A=1 D=1

kfdpDtaEv1[5].status: 127 ; 0x0f0: I=1 V=1 V=1 P=1 P=1 A=1 D=1

[grid@rescureora1 ~]$ kfed read /dev/asm-test-diskh aun=1 blkn=2|grep "status"|grep -v "I=0"

kfdpDtaEv1[0].status: 21 ; 0x000: I=1 V=0 V=1 P=0 P=1 A=0 D=0

kfdpDtaEv1[1].status: 21 ; 0x030: I=1 V=0 V=1 P=0 P=1 A=0 D=0

kfdpDtaEv1[2].status: 127 ; 0x060: I=1 V=1 V=1 P=1 P=1 A=1 D=1

kfdpDtaEv1[3].status: 127 ; 0x090: I=1 V=1 V=1 P=1 P=1 A=1 D=1

kfdpDtaEv1[4].status: 127 ; 0x0c0: I=1 V=1 V=1 P=1 P=1 A=1 D=1

kfdpDtaEv1[5].status: 127 ; 0x0f0: I=1 V=1 V=1 P=1 P=1 A=1 D=1

[grid@rescureora1 ~]$ kfed read /dev/asm-test-diskc aun=1 blkn=2|grep "status"|grep -v "I=0" > repair.txt

4.2 修改磁盘的状态，这里将磁盘1和2的状态值修改为127就可以

kfdpDtaEv1[0].status:               127 ; 0x000: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[1].status:               127 ; 0x030: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[2].status:               127 ; 0x060: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[3].status:               127 ; 0x090: I=1 V=1 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[4].status:               127 ; 0x0c0: I=1 V=0 V=1 P=1 P=1 A=1 D=1
kfdpDtaEv1[5].status:               127 ; 0x0f0: I=1 V=1 V=1 P=1 P=1 A=1 D=1
[grid@rescureora1 ~]$ kfed merge /dev/asm-test-diskh aun=1 blkn=2 text=repair.txt
[grid@rescureora1 ~]$ kfed merge /dev/asm-test-diskc aun=1 blkn=2 text=repair.txt
[grid@rescureora1 ~]$ kfed merge /dev/asm-test-diskh aun=1 blkn=3 text=repair_3.txt
[grid@rescureora1 ~]$ kfed merge /dev/asm-test-diskc aun=1 blkn=3 text=repair_3.txt

kfdpDtaEv1[0].status: 127 ; 0x000: I=1 V=1 V=1 P=1 P=1 A=1 D=1

kfdpDtaEv1[1].status: 127 ; 0x030: I=1 V=1 V=1 P=1 P=1 A=1 D=1

kfdpDtaEv1[2].status: 127 ; 0x060: I=1 V=1 V=1 P=1 P=1 A=1 D=1

kfdpDtaEv1[3].status: 127 ; 0x090: I=1 V=1 V=1 P=1 P=1 A=1 D=1

kfdpDtaEv1[4].status: 127 ; 0x0c0: I=1 V=0 V=1 P=1 P=1 A=1 D=1

kfdpDtaEv1[5].status: 127 ; 0x0f0: I=1 V=1 V=1 P=1 P=1 A=1 D=1

[grid@rescureora1 ~]$ kfed merge /dev/asm-test-diskh aun=1 blkn=2 text=repair.txt

[grid@rescureora1 ~]$ kfed merge /dev/asm-test-diskc aun=1 blkn=2 text=repair.txt

[grid@rescureora1 ~]$ kfed merge /dev/asm-test-diskh aun=1 blkn=3 text=repair_3.txt

[grid@rescureora1 ~]$ kfed merge /dev/asm-test-diskc aun=1 blkn=3 text=repair_3.txt

4.3 PST修复完成，尝试mount磁盘组

SQL&gt; alter diskgroup test mount force;
alter diskgroup test mount force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15096: lost disk write detected
ORA-15042: ASM disk "4" is missing from group number "2"

SQL> alter diskgroup test mount force;

alter diskgroup test mount force

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15096: lost disk write detected

ORA-15042: ASM disk "4" is missing from group number "2"

这里报写丢失，跟之前的报错已经不一样，此时是由于磁盘组在挂载时做recover报错，那么很简单，跳过recover就可以。

此时报错稍稍有些不同，磁盘组在进行recover的时候报错，checkpoint为seq=7,block=1474

NOTE: starting recovery of thread=1 ckpt=7.1474 group=2 (TEST)
NOTE: BWR validation signaled ORA-15096
Errors in file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_ora_4035.trc:
ORA-15096: lost disk write detected
ORA-15042: ASM disk "4" is missing from group number "2"

NOTE: starting recovery of thread=1 ckpt=7.1474 group=2 (TEST)

NOTE: BWR validation signaled ORA-15096

Errors in file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_ora_4035.trc:

ORA-15096: lost disk write detected

ORA-15042: ASM disk "4" is missing from group number "2"

查看ACD checkpoint block：

kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            7 ; 0x002: KFBTYP_ACDC
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       3 ; 0x008: file=3
kfbh.check:                  1111750266 ; 0x00c: 0x4243f67a
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfracdc.eyec[0]:                     65 ; 0x000: 0x41
kfracdc.eyec[1]:                     67 ; 0x001: 0x43
kfracdc.eyec[2]:                     68 ; 0x002: 0x44
kfracdc.eyec[3]:                     67 ; 0x003: 0x43
kfracdc.thread:                       1 ; 0x004: 0x00000001
kfracdc.lastAba.seq:         4294967295 ; 0x008: 0xffffffff
kfracdc.lastAba.blk:         4294967295 ; 0x00c: 0xffffffff
kfracdc.blk0:                         1 ; 0x010: 0x00000001
kfracdc.blks:                     10751 ; 0x014: 0x000029ff
kfracdc.ckpt.seq:                     7 ; 0x018: 0x00000007      --此处标红
kfracdc.ckpt.blk:                  1474 ; 0x01c: 0x000005c2      --此处标红
kfracdc.fcn.base:                  6657 ; 0x020: 0x00001a01
kfracdc.fcn.wrap:                     0 ; 0x024: 0x00000000
kfracdc.bufBlks:                    256 ; 0x028: 0x00000100
kfracdc.strt112.seq:                  2 ; 0x02c: 0x00000002
kfracdc.strt112.blk:                  0 ; 0x030: 0x00000000

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 7 ; 0x002: KFBTYP_ACDC

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: blk=0

kfbh.block.obj: 3 ; 0x008: file=3

kfbh.check: 1111750266 ; 0x00c: 0x4243f67a

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfracdc.eyec[0]: 65 ; 0x000: 0x41

kfracdc.eyec[1]: 67 ; 0x001: 0x43

kfracdc.eyec[2]: 68 ; 0x002: 0x44

kfracdc.eyec[3]: 67 ; 0x003: 0x43

kfracdc.thread: 1 ; 0x004: 0x00000001

kfracdc.lastAba.seq: 4294967295 ; 0x008: 0xffffffff

kfracdc.lastAba.blk: 4294967295 ; 0x00c: 0xffffffff

kfracdc.blk0: 1 ; 0x010: 0x00000001

kfracdc.blks: 10751 ; 0x014: 0x000029ff

kfracdc.ckpt.seq: 7 ; 0x018: 0x00000007 --此处标红

kfracdc.ckpt.blk: 1474 ; 0x01c: 0x000005c2 --此处标红

kfracdc.fcn.base: 6657 ; 0x020: 0x00001a01

kfracdc.fcn.wrap: 0 ; 0x024: 0x00000000

kfracdc.bufBlks: 256 ; 0x028: 0x00000100

kfracdc.strt112.seq: 2 ; 0x02c: 0x00000002

kfracdc.strt112.blk: 0 ; 0x030: 0x00000000

修改ACD，并修复回asm环境：

[grid@rescureora1 trace]$ cat acd.txt
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            7 ; 0x002: KFBTYP_ACDC
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       3 ; 0x008: file=3
kfbh.check:                  1111750266 ; 0x00c: 0x4243f67a
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfracdc.eyec[0]:                     65 ; 0x000: 0x41
kfracdc.eyec[1]:                     67 ; 0x001: 0x43
kfracdc.eyec[2]:                     68 ; 0x002: 0x44
kfracdc.eyec[3]:                     67 ; 0x003: 0x43
kfracdc.thread:                       1 ; 0x004: 0x00000001
kfracdc.lastAba.seq:         4294967295 ; 0x008: 0xffffffff
kfracdc.lastAba.blk:         4294967295 ; 0x00c: 0xffffffff
kfracdc.blk0:                         1 ; 0x010: 0x00000001
kfracdc.blks:                     10751 ; 0x014: 0x000029ff
kfracdc.ckpt.seq:                     9 ; 0x018: 0x00000007
kfracdc.ckpt.blk:                  1474 ; 0x01c: 0x000005c2
kfracdc.fcn.base:                  6657 ; 0x020: 0x00001a01
kfracdc.fcn.wrap:                     0 ; 0x024: 0x00000000
kfracdc.bufBlks:                    256 ; 0x028: 0x00000100
kfracdc.strt112.seq:                  2 ; 0x02c: 0x00000002
kfracdc.strt112.blk:                  0 ; 0x030: 0x00000000
kfed merge /dev/asm-test-diske aun=4 blkn=0 text=acd.txt

[grid@rescureora1 trace]$ cat acd.txt

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 7 ; 0x002: KFBTYP_ACDC

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: blk=0

kfbh.block.obj: 3 ; 0x008: file=3

kfbh.check: 1111750266 ; 0x00c: 0x4243f67a

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfracdc.eyec[0]: 65 ; 0x000: 0x41

kfracdc.eyec[1]: 67 ; 0x001: 0x43

kfracdc.eyec[2]: 68 ; 0x002: 0x44

kfracdc.eyec[3]: 67 ; 0x003: 0x43

kfracdc.thread: 1 ; 0x004: 0x00000001

kfracdc.lastAba.seq: 4294967295 ; 0x008: 0xffffffff

kfracdc.lastAba.blk: 4294967295 ; 0x00c: 0xffffffff

kfracdc.blk0: 1 ; 0x010: 0x00000001

kfracdc.blks: 10751 ; 0x014: 0x000029ff

kfracdc.ckpt.seq: 9 ; 0x018: 0x00000007

kfracdc.ckpt.blk: 1474 ; 0x01c: 0x000005c2

kfracdc.fcn.base: 6657 ; 0x020: 0x00001a01

kfracdc.fcn.wrap: 0 ; 0x024: 0x00000000

kfracdc.bufBlks: 256 ; 0x028: 0x00000100

kfracdc.strt112.seq: 2 ; 0x02c: 0x00000002

kfracdc.strt112.blk: 0 ; 0x030: 0x00000000

kfed merge /dev/asm-test-diske aun=4 blkn=0 text=acd.txt

4.5 磁盘组正常挂载

SQL&gt; alter diskgroup test mount force;
Diskgroup altered.

SQL> alter diskgroup test mount force;

Diskgroup altered.

4.6 启动数据库

SQL&gt; startup
ORACLE instance started.
Total System Global Area  839282688 bytes
Fixed Size                  2257880 bytes
Variable Size             541068328 bytes
Database Buffers          289406976 bytes
Redo Buffers                6549504 bytes
Database mounted.
ORA-01113: file 5 needs media recovery
ORA-01110: data file 5: '+TEST/rescureora/datafile/test.256.1034246527'
如果在正常环境中，此时会出现数据不一致的情况，当然，如果有归档日志在，那么就可以向本案例一样，完美的解决。
SQL&gt; recover database;
Media recovery complete.
SQL&gt; alter database open;

SQL> startup

ORACLE instance started.

Total System Global Area 839282688 bytes

Fixed Size 2257880 bytes

Variable Size 541068328 bytes

Database Buffers 289406976 bytes

Redo Buffers 6549504 bytes

Database mounted.

ORA-01113: file 5 needs media recovery

ORA-01110: data file 5: '+TEST/rescureora/datafile/test.256.1034246527'

如果在正常环境中，此时会出现数据不一致的情况，当然，如果有归档日志在，那么就可以向本案例一样，完美的解决。

SQL> recover database;

Media recovery complete.

SQL> alter database open;

5 数据验证，无丢失

SQL&gt; select object_id,object_name,
2 dbms_rowid.rowid_block_number(rowid) block#,
3 dbms_rowid.rowid_relative_fno(rowid) file#
4 from sys.rescureora where rownum=1;
OBJECT_ID OBJECT_NAME BLOCK# FILE#
---------- ------------------------------ ---------- ----------
20 rescureora 131 5

SQL> select object_id,object_name,

2 dbms_rowid.rowid_block_number(rowid) block#,

3 dbms_rowid.rowid_relative_fno(rowid) file#

4 from sys.rescureora where rownum=1;

OBJECT_ID OBJECT_NAME BLOCK# FILE#

---------- ------------------------------ ---------- ----------

20 rescureora 131 5

案例：一则非常巧合的ORA-15042恢复

发表回复取消回复

近期文章

近期评论

案例：一则非常巧合的ORA-15042恢复

发表回复 取消回复

近期文章

近期评论

发表回复取消回复