Pages

Wednesday, March 20, 2013

How to recover contiguous corrupted data blocks using 'recover datafile' RMAN syntax

In this post I want to simulate a corrupt block recovery: honestly it happened to me only two or three times several years ago and it's generally caused by the I/O subsystem or because of software bugs.
Naturally you should consider this kind of recovery when you have few corrupt blocks in a large datafile otherwise you should consider other familiar kind of recovery.
To simulate a recover of corrupt blocks I begin to create first a new tablespace formed by one small datafile, a user, a table and some rows:
SQL> create tablespace marcov datafile '/home/oracle/app/oracle/oradata/orcl/marcov01.dbf' size 1M autoextend on next 5M maxsize 50M;

Tablespace created.

SQL> DROP USER MARCOV CASCADE;
DROP USER MARCOV CASCADE
          *
ERROR at line 1:
ORA-01918: user 'MARCOV' does not exist


SQL> CREATE USER MARCOV
  2  IDENTIFIED BY MARCOV
  3  DEFAULT TABLESPACE MARCOV
  4  TEMPORARY TABLESPACE TEMP
  5  PROFILE DEFAULT
  6  ACCOUNT UNLOCK;

User created.

SQL> GRANT CONNECT, CREATE TABLE, CREATE SESSION TO MARCOV;

Grant succeeded.

SQL> ALTER USER MARCOV QUOTA UNLIMITED ON MARCOV;

User altered.

SQL> CREATE TABLE MARCOV.T1 (a number, b char(1000));

Table created.

SQL> insert into MARCOV.T1 select level, dummy from dual
  2  connect by level <= 1000;

1000 rows created.

SQL> commit;

Commit complete.
Let's update the statistics on this new table...
SQL> BEGIN
  2  SYS.DBMS_STATS.GATHER_TABLE_STATS (
  3  OwnName        => 'MARCOV'
  4  ,TabName        => 'T1'
  5  ,Estimate_Percent  => 1
  6  ,Method_Opt        => 'FOR ALL COLUMNS SIZE 1'
  7  ,Degree            => 4
  8  ,Cascade           => TRUE
  9  ,No_Invalidate     => FALSE);
 10  END;
 11  /

PL/SQL procedure successfully completed.
... and let's see how many blocks this segment has and how many are used:
SQL> SELECT NUM_ROWS, BLOCKS
  2  FROM SYS.DBA_TABLES
  3  WHERE TABLE_NAME = 'T1'
  4  AND OWNER = 'MARCOV';

  NUM_ROWS     BLOCKS
---------- ----------
      1000        244

SQL> select header_block, blocks, extents from dba_segments where segment_name='T1';

HEADER_BLOCK     BLOCKS    EXTENTS
------------ ---------- ----------
          10        256         17
From a sqlplus session I want to dump some blocks of the new datafile using the following command:
SQL> alter system dump datafile '/home/oracle/app/oracle/oradata/orcl/marcov01.dbf' block min 1 block max 10;

System altered.
and few of them are reported here (I have omitted several lines):
vi /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7255.trc
...
buffer tsn: 40 rdba: 0x02c00003 (11/3)
scn: 0x0000.00e2a62a seq: 0x01 flg: 0x04 tail: 0xa62a1e01
frmt: 0x02 chkval: 0x4dcc type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
buffer tsn: 40 rdba: 0x02c00004 (11/4)
scn: 0x0000.00e2a047 seq: 0x01 flg: 0x04 tail: 0xa0471e01
frmt: 0x02 chkval: 0x823c type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
buffer tsn: 40 rdba: 0x02c00005 (11/5)
scn: 0x0000.00e2a049 seq: 0x01 flg: 0x04 tail: 0xa0491e01
frmt: 0x02 chkval: 0xc235 type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
buffer tsn: 40 rdba: 0x02c00006 (11/6)
scn: 0x0000.00e2a04b seq: 0x01 flg: 0x04 tail: 0xa04b1e01
frmt: 0x02 chkval: 0x022e type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
Running DBVERIFY on the datafile no error or corrupt block is detected:
[oracle@localhost trace]$ dbv file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf blocksize=8192

DBVERIFY: Release 11.2.0.2.0 - Production on Mon Mar 18 07:29:48 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /home/oracle/app/oracle/oradata/orcl/marcov01.dbf

DBVERIFY - Verification complete

Total Pages Examined         : 768
Total Pages Processed (Data) : 196
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 19
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 553
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 14853855 (0.14853855)
Using the following command, dd resets two blocks of the datafile starting from the fourth.
[oracle@localhost trace]$ dd if=/dev/zero of=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf bs=8k conv=notrunc seek=4 count=2
2+0 records in
2+0 records out
16384 bytes (16 kB) copied, 0.00047551 seconds, 34.5 MB/s
Let's dump again the same few blocks of the datafile ...
SQL> alter system dump datafile '/home/oracle/app/oracle/oradata/orcl/marcov01.dbf' block min 1 block max 10;

System altered.
... and compare them with the previous blocks. Blocks marked as (11/4) and (11/5) are totally reset and currently identified as (0/0):
vi /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7255.trc
...
buffer tsn: 40 rdba: 0x02c00003 (11/3)
scn: 0x0000.00e2a62a seq: 0x01 flg: 0x04 tail: 0xa62a1e01
frmt: 0x02 chkval: 0x4dcc type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
buffer tsn: 40 rdba: 0x00000000 (0/0)
scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001
frmt: 0x02 chkval: 0x0000 type: 0x00=unknown
Hex dump of block: st=0, typ_found=0
...
buffer tsn: 40 rdba: 0x00000000 (0/0)
scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001
frmt: 0x02 chkval: 0x0000 type: 0x00=unknown
Hex dump of block: st=0, typ_found=0
...
buffer tsn: 40 rdba: 0x02c00006 (11/6)
scn: 0x0000.00e2a04b seq: 0x01 flg: 0x04 tail: 0xa04b1e01
frmt: 0x02 chkval: 0x022e type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
Now DBVERIFY is able to find two corrupt blocks on the datafile:
[oracle@localhost trace]$ dbv file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf blocksize=8192

DBVERIFY: Release 11.2.0.2.0 - Production on Mon Mar 18 07:49:40 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /home/oracle/app/oracle/oradata/orcl/marcov01.dbf
Page 4 is marked corrupt
Corrupt block relative dba: 0x02c00004 (file 11, block 4)
Completely zero block found during dbv: 

Page 5 is marked corrupt
Corrupt block relative dba: 0x02c00005 (file 11, block 5)
Completely zero block found during dbv: 

DBVERIFY - Verification complete

Total Pages Examined         : 768
Total Pages Processed (Data) : 196
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 17
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 553
Total Pages Marked Corrupt   : 2
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 14853855 (0.14853855)
During a backup if RMAN detects corrupt blocks it terminates the command: the default threshold of total number of physical and logical corruptions permitted on a file is zero, so no corrupt blocks are tolerated.  
MAXCORRUPT is the setting you can use to modify this behaviour and tolerate more than one corrupt block. Have a look at the following example, considering that I have two corrupt blocks. When I specify 1 as maxcorrupt setting my backup still fails:
RMAN> run {
2> set maxcorrupt for datafile 11 to 1;
3> backup datafile 11;
4> }

executing command: SET MAX CORRUPT

Starting backup at 18-03-2013 09:01:05
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00011 name=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf
channel ORA_DISK_1: starting piece 1 at 18-03-2013 09:01:05
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 03/18/2013 09:01:06
ORA-19566: exceeded limit of 1 corrupt blocks for file /home/oracle/app/oracle/oradata/orcl/marcov01.dbf
When I specify to tolerate at most two corrupt blocks the backup command is able to complete successfully (backup piece: o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp). It should be clear anyway that I was able to complete a backup including a datafile containing two corrupt blocks!!!
RMAN> run {
2> set maxcorrupt for datafile 11 to 2;
3> backup datafile 11;
4> }

executing command: SET MAX CORRUPT

Starting backup at 18-03-2013 09:01:32
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00011 name=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf
channel ORA_DISK_1: starting piece 1 at 18-03-2013 09:01:33
channel ORA_DISK_1: finished piece 1 at 18-03-2013 09:01:34
piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp tag=TAG20130318T090132 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 18-03-2013 09:01:34

Starting Control File and SPFILE Autobackup at 18-03-2013 09:01:34
piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/autobackup/2013_03_18/o1_mf_s_810378094_8ngglyk1_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 18-03-2013 09:01:35
After RMAN detects these kind of problems it writes which blocks in a datafile were marked corrupt on the V$DATABASE_BLOCK_CORRUPTION view and, until these corrupt blocks are not repaired, the rows remain in the view. Datafile 11 seems to have 2 corrupt blocks starting from block number 4:
SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
        11          4          2                  0 ALL ZERO
On the alert log you can find the following errors:
...
Mon Mar 18 09:01:05 2013
Hex dump of (file 11, block 4) in trace file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7493.trc
Corrupt block relative dba: 0x02c00004 (file 11, block 4)
Completely zero block found during backing up datafile
Reread of blocknum=4, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=4, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=4, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=4, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=4, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Hex dump of (file 11, block 5) in trace file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7493.trc
Corrupt block relative dba: 0x02c00005 (file 11, block 5)
Completely zero block found during backing up datafile
Reread of blocknum=5, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=5, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=5, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=5, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=5, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Deleted Oracle managed file /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T090105_8nggl1fw_.bkp
Checker run found 1 new persistent data failures
...
To perform a recovery of these blocks the database has to be in ARCHIVELOG mode, could be either in MOUNT or OPEN mode and it is not necessary to put the affected datafiles offline. Let's try to recover first one block, specifying the following command:
RMAN> recover datafile 11 block 4;

Starting recover at 18-03-2013 09:37:06
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00011
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp tag=TAG20130318T090132
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01
failover to previous backup

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00011
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp tag=TAG20130318T072836
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 18-03-2013 09:37:11
As you can see RMAN is able to look in a older backup untill it find a useful backup piece. The first backup piece used by RMAN was indeed o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp, containing the same corrupt blocks because I previously used the set maxcorrupt option. Only one block was recovered so the V$DATABASE_BLOCK_CORRUPTION view is still populated, but with a different information. Only one block seems to be corrupt on datafile 11:
SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
        11          5          1                  0 ALL ZERO
Let's check again the dump of the datafile.
SQL> alter system dump datafile '/home/oracle/app/oracle/oradata/orcl/marcov01.dbf' block min 1 block max 10; 

System altered.
The block number (11/4) now contains useful and valid information, whereas block number (11/5) is still corrupt and identified by (0/0):
vi /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7255.trc
...
buffer tsn: 40 rdba: 0x02c00004 (11/4)
scn: 0x0000.00e2a047 seq: 0x01 flg: 0x04 tail: 0xa0471e01
frmt: 0x02 chkval: 0x823c type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
buffer tsn: 40 rdba: 0x00000000 (0/0)
scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001
frmt: 0x02 chkval: 0x0000 type: 0x00=unknown
Hex dump of block: st=0, typ_found=0
...
buffer tsn: 40 rdba: 0x02c00006 (11/6)
scn: 0x0000.00e2a04b seq: 0x01 flg: 0x04 tail: 0xa04b1e01
frmt: 0x02 chkval: 0x022e type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
Let's try to recover the block number 5:
RMAN> recover datafile 11 block 5;

Starting recover at 18-03-2013 09:42:35
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00011
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp tag=TAG20130318T090132
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:00
failover to previous backup

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00011
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp tag=TAG20130318T072836
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:00

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished recover at 18-03-2013 09:42:40
Again RMAN "failover to previous backup", it is able to recover block number 5 using the next available backup piece (o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp) and the V$DATABASE_BLOCK_CORRUPTION is finally empty.
SQL> select * from  v$database_block_corruption;

no rows selected
Next post will be about how to recover sparse corrupt blocks.

That's all.