Database administrator workshop: March 2013

Friday, March 29, 2013

How to duplicate a production database on a different server mantaining the same SID and directory structures with 'duplicate' RMAN command

In the following scenario I simply want to have a copy of my production database to a different server preserving the same database directory structures and even the same database name.
My production database is running on vsi08 server and its system id (ORACLE_SID) is PROD; the remote server is vsi10 and I want to duplicate here the PROD database currently available on vsi08.

Usually when I install a new database I try to save all the related scripts to recreate it: it's just another checkbox during a typical dbca installation.

On the local vsi08 server I can find those scripts located at $ORACLE_BASE/admin/'db_name'/scripts, that is:

[oracle@vsi08 ~]$ cd /opt/app/oracle/admin/PROD/scripts
[oracle@vsi08 scripts]$ ll
total 52
-rw-r----- 1 oracle oinstall  234 Mar 27 13:41 apex.sql
-rw-r----- 1 oracle oinstall  756 Mar 27 13:41 CreateDBCatalog.sql
-rw-r----- 1 oracle oinstall  396 Mar 27 13:41 CreateDBFiles.sql
-rw-r----- 1 oracle oinstall 1219 Mar 27 13:41 CreateDB.sql
-rw-r----- 1 oracle oinstall 1916 Mar 27 13:41 init.ora
-rw-r----- 1 oracle oinstall  202 Mar 27 13:41 interMedia.sql
-rw-r----- 1 oracle oinstall  443 Mar 27 13:41 JServer.sql
-rw-r----- 1 oracle oinstall  507 Mar 27 13:41 lockAccount.sql
-rw-r----- 1 oracle oinstall  211 Mar 27 13:41 ordinst.sql
-rw-r----- 1 oracle oinstall  786 Mar 27 13:41 postDBCreation.sql
-rwxr-xr-x 1 oracle oinstall  685 Mar 27 13:41 PROD.sh
-rwxr-xr-x 1 oracle oinstall  781 Mar 27 13:41 PROD.sql
-rw-r----- 1 oracle oinstall  395 Mar 27 13:41 xdb_protocol.sql

If you look at the content of the unique sh file you can verify it creates few directories. I want to use the content of that file to create the same database directory structures on the remote server.

[oracle@vsi08 scripts]$ more PROD.sh 
#!/bin/sh

OLD_UMASK=`umask`
umask 0027
mkdir -p /opt/app/oracle/admin/PROD/adump
mkdir -p /opt/app/oracle/admin/PROD/dpdump
mkdir -p /opt/app/oracle/admin/PROD/pfile
mkdir -p /opt/app/oracle/cfgtoollogs/dbca/PROD
mkdir -p /opt/app/oracle/flash_recovery_area
mkdir -p /opt/app/oracle/flash_recovery_area/PROD
mkdir -p /opt/app/oracle/oradata/PROD
mkdir -p /opt/app/oracle/product/11.2.0/db_1/dbs
umask ${OLD_UMASK}
ORACLE_SID=PROD; export ORACLE_SID
PATH=$ORACLE_HOME/bin:$PATH; export PATH
echo You should Add this entry in the /etc/oratab: PROD:/opt/app/oracle/product/11.2.0/db_1:Y
/opt/app/oracle/product/11.2.0/db_1/bin/sqlplus /nolog @/opt/app/oracle/admin/PROD/scripts/PROD.sql

So on remote vsi10 server I recreate the same directory structures of vsi08:

[oracle@vsi10 ~]$ mkdir -p /opt/app/oracle/admin/PROD/adump
[oracle@vsi10 ~]$ mkdir -p /opt/app/oracle/admin/PROD/dpdump
[oracle@vsi10 ~]$ mkdir -p /opt/app/oracle/admin/PROD/pfile
[oracle@vsi10 ~]$ mkdir -p /opt/app/oracle/cfgtoollogs/dbca/PROD
[oracle@vsi10 ~]$ mkdir -p /opt/app/oracle/flash_recovery_area
[oracle@vsi10 ~]$ mkdir -p /opt/app/oracle/flash_recovery_area/PROD
[oracle@vsi10 ~]$ mkdir -p /opt/app/oracle/oradata/PROD
[oracle@vsi10 ~]$ mkdir -p /opt/app/oracle/product/11.2.0/db_1/dbs

Next step is to create a pfile from the current PROD database located on vsi08 server and transfer that copy to vsi10 server:

[oracle@vsi08 pfile]$ sqlplus / as sysdba
SQL> create pfile='/tmp/initPROD_20130328.txt' from spfile;

File created.

[oracle@vsi08 pfile]$ scp /tmp/initPROD_20130328.txt vsi10.mydomain.it:/opt/app/oracle/admin/PROD/pfile
oracle@vsi10.mydomain.it's password: 
initPROD_20130328.txt                                 100%  944     0.9KB/s   00:00

Now it's time to create a tns entry for vsi10 server referencing PROD database at vsi08 server. I first have a look at the current tns entry on the local vsi08 server...

[oracle@vsi08 admin]$ pwd
/opt/app/oracle/product/11.2.0/db_1/network/admin
[oracle@vsi08 admin]$ cat tnsnames.ora 
# tnsnames.ora Network Configuration File: /opt/app/oracle/product/11.2.0/db_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

PROD =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vsi08.mydomain.it)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = PROD)
    )
  )

... and then, using a simple copy-and-paste operation, I can create the new one on tnsnames.ora file of vsi10 server.
I modify only the reference of the tns entry from PROD to PROD_AT_VSI08 because I really want to remember I'm connecting to PROD database located at vsi08 server:

[oracle@vsi10 ~]$ cd $ORACLE_HOME 
[oracle@vsi10 db_1]$ cd network/admin/
[oracle@vsi10 admin]$ vi tnsnames.ora 
[oracle@vsi10 admin]$ cat tnsnames.ora 
PROD_AT_VSI08 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vsi08.mydomain.it)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = PROD)
    )
  )

After the creation of PROD_AT_VSI08 tns entry I want to test it and try to connect from vsi10 server to PROD database on vsi08 server.
Before proceeding I have to successfully establish a connection otherwise my attempt to duplicate PROD database on vsi10 server won't work.

[oracle@vsi10 admin]$ sqlplus sys/oracle@PROD_AT_VSI08 as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Thu Mar 28 12:37:34 2013

Copyright (c) 1982, 2009, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> exit

Ok... I'm able to connect to PROD database at vsi08 server and now it's time to have a complete backup of the database I want to duplicate. So from vsi08 server I set the following environment settings, backing up the database and its archivelogs. RMAN is configured using autobackup on so it will perform also a backup of the current control file and spfile:

[oracle@vsi08 ~]$ export NLS_DATE_FORMAT='DD-MM-RRRR HH24:MI:SS'
[oracle@vsi08 ~]$ export ORACLE_SID=PROD
[oracle@vsi08 ~]$ rman target /

Recovery Manager: Release 11.2.0.1.0 - Production on Thu Mar 28 10:49:42 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PROD (DBID=223010867)

RMAN> backup database plus archivelog;

Starting backup at 28-03-2013 10:52:07
current log archived
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=19 device type=DISK
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=50 RECID=3 STAMP=811248727
channel ORA_DISK_1: starting piece 1 at 28-03-2013 10:52:07
channel ORA_DISK_1: finished piece 1 at 28-03-2013 10:52:08
piece handle=/opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/o1_mf_annnn_TAG20130328T105207_8o84p7o6_.bkp tag=TAG20130328T105207 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 28-03-2013 10:52:08

Starting backup at 28-03-2013 10:52:08
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/opt/app/oracle/oradata/PROD/system01.dbf
input datafile file number=00002 name=/opt/app/oracle/oradata/PROD/sysaux01.dbf
input datafile file number=00003 name=/opt/app/oracle/oradata/PROD/undotbs01.dbf
input datafile file number=00004 name=/opt/app/oracle/oradata/PROD/users01.dbf
channel ORA_DISK_1: starting piece 1 at 28-03-2013 10:52:08
channel ORA_DISK_1: finished piece 1 at 28-03-2013 10:52:33
piece handle=/opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/o1_mf_nnndf_TAG20130328T105208_8o84p8vq_.bkp tag=TAG20130328T105208 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25
Finished backup at 28-03-2013 10:52:33

Starting backup at 28-03-2013 10:52:33
current log archived
using channel ORA_DISK_1
channel ORA_DISK_1: starting archived log backup set
channel ORA_DISK_1: specifying archived log(s) in backup set
input archived log thread=1 sequence=51 RECID=4 STAMP=811248753
channel ORA_DISK_1: starting piece 1 at 28-03-2013 10:52:34
channel ORA_DISK_1: finished piece 1 at 28-03-2013 10:52:35
piece handle=/opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/o1_mf_annnn_TAG20130328T105233_8o84q231_.bkp tag=TAG20130328T105233 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 28-03-2013 10:52:35

Starting Control File and SPFILE Autobackup at 28-03-2013 10:52:35
piece handle=/opt/app/oracle/flash_recovery_area/PROD/autobackup/2013_03_28/o1_mf_s_811248755_8o84q3cz_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 28-03-2013 10:52:36

The current backup pieces are located into two directories:

RMAN> list backup;

using target database control file instead of recovery catalog

List of Backup Sets
===================


BS Key  Size       Device Type Elapsed Time Completion Time    
------- ---------- ----------- ------------ -------------------
5       3.00K      DISK        00:00:00     28-03-2013 10:52:07
        BP Key: 5   Status: AVAILABLE  Compressed: NO  Tag: TAG20130328T105207
        Piece Name: /opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/o1_mf_annnn_TAG20130328T105207_8o84p7o6_.bkp

  List of Archived Logs in backup set 5
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    50      730126     28-03-2013 10:48:33 730283     28-03-2013 10:52:07

BS Key  Type LV Size       Device Type Elapsed Time Completion Time    
------- ---- -- ---------- ----------- ------------ -------------------
6       Full    725.99M    DISK        00:00:17     28-03-2013 10:52:25
        BP Key: 6   Status: AVAILABLE  Compressed: NO  Tag: TAG20130328T105208
        Piece Name: /opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/o1_mf_nnndf_TAG20130328T105208_8o84p8vq_.bkp
  List of Datafiles in backup set 6

  File LV Type Ckp SCN    Ckp Time            Name
  ---- -- ---- ---------- ------------------- ----
  1       Full 730292     28-03-2013 10:52:08 /opt/app/oracle/oradata/PROD/system01.dbf
  2       Full 730292     28-03-2013 10:52:08 /opt/app/oracle/oradata/PROD/sysaux01.dbf
  3       Full 730292     28-03-2013 10:52:08 /opt/app/oracle/oradata/PROD/undotbs01.dbf
  4       Full 730292     28-03-2013 10:52:08 /opt/app/oracle/oradata/PROD/users01.dbf

BS Key  Size       Device Type Elapsed Time Completion Time    
------- ---------- ----------- ------------ -------------------
7       2.00K      DISK        00:00:00     28-03-2013 10:52:34
        BP Key: 7   Status: AVAILABLE  Compressed: NO  Tag: TAG20130328T105233
        Piece Name: /opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/o1_mf_annnn_TAG20130328T105233_8o84q231_.bkp

  List of Archived Logs in backup set 7
  Thrd Seq     Low SCN    Low Time            Next SCN   Next Time
  ---- ------- ---------- ------------------- ---------- ---------
  1    51      730283     28-03-2013 10:52:07 730305     28-03-2013 10:52:33

BS Key  Type LV Size       Device Type Elapsed Time Completion Time    
------- ---- -- ---------- ----------- ------------ -------------------
8       Full    9.36M      DISK        00:00:00     28-03-2013 10:52:35
        BP Key: 8   Status: AVAILABLE  Compressed: NO  Tag: TAG20130328T105235
        Piece Name: /opt/app/oracle/flash_recovery_area/PROD/autobackup/2013_03_28/o1_mf_s_811248755_8o84q3cz_.bkp
  SPFILE Included: Modification time: 28-03-2013 10:45:50
  SPFILE db_unique_name: PROD
  Control File Included: Ckp SCN: 730314       Ckp time: 28-03-2013 10:52:35

I need to create the same backup directories into vsi10 server:

[oracle@vsi10 ~]$ mkdir -p /opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/
[oracle@vsi10 ~]$ mkdir -p /opt/app/oracle/flash_recovery_area/PROD/autobackup/2013_03_28/

From vsi08 server I copy the backup pieces to vsi10 server into the appropriate directories:

[oracle@vsi08 ~]$ ll /opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/
total 744164
-rw-r----- 1 oracle oinstall      3584 Mar 28 10:52 o1_mf_annnn_TAG20130328T105207_8o84p7o6_.bkp
-rw-r----- 1 oracle oinstall      2560 Mar 28 10:52 o1_mf_annnn_TAG20130328T105233_8o84q231_.bkp
-rw-r----- 1 oracle oinstall 761266176 Mar 28 10:52 o1_mf_nnndf_TAG20130328T105208_8o84p8vq_.bkp
[oracle@vsi08 ~]$ scp /opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/* vsi10.mydomain.it:/opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/
oracle@vsi10.mydomain.it's password: 
o1_mf_annnn_TAG20130328T105207_8o84p7o6_.bkp          100% 3584     3.5KB/s   00:00    
o1_mf_annnn_TAG20130328T105233_8o84q231_.bkp          100% 2560     2.5KB/s   00:00    
o1_mf_nnndf_TAG20130328T105208_8o84p8vq_.bkp          100%  726MB  40.3MB/s   00:18    
[oracle@vsi08 ~]$ ll /opt/app/oracle/flash_recovery_area/PROD/autobackup/2013_03_28/
total 9616
-rw-r----- 1 oracle oinstall 9830400 Mar 28 10:52 o1_mf_s_811248755_8o84q3cz_.bkp
[oracle@vsi08 ~]$ scp /opt/app/oracle/flash_recovery_area/PROD/autobackup/2013_03_28/* vsi10.mydomain.it:/opt/app/oracle/flash_recovery_area/PROD/autobackup/2013_03_28/
oracle@vsi10.mydomain.it's password: 
o1_mf_s_811248755_8o84q3cz_.bkp                       100% 9600KB   9.4MB/s   00:00

From the vsi10 machine I need to start the "next" duplicated PROD instance in NOMOUNT mode and with the previously copied pfile.

[oracle@vsi10 admin]$ export ORACLE_SID=PROD
[oracle@vsi10 admin]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Thu Mar 28 12:38:49 2013

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Connected to an idle instance.
SQL> startup nomount pfile='/opt/app/oracle/admin/PROD/pfile/initPROD_20130328.txt'
ORACLE instance started.

Total System Global Area 1686925312 bytes
Fixed Size      2213976 bytes
Variable Size    989857704 bytes
Database Buffers   687865856 bytes
Redo Buffers      6987776 bytes

When the instance is started in NOMOUNT mode I can remotely connect RMAN to PROD database at vsi08 as target connection and to the "next" PROD database at the local server (vsi10) as auxiliary connection:

[oracle@vsi10 admin]$ rman target sys/oracle@PROD_AT_VSI08 auxiliary /

Recovery Manager: Release 11.2.0.1.0 - Production on Thu Mar 28 14:34:25 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PROD (DBID=223010867)
connected to auxiliary database: PROD (not mounted)

With the following duplicate command I want to restore all the datafiles located into the backup pieces and automatically start the new database on vsi10 server. At the end of the duplicate process I will have a complete copy of my production database with the same directory structures and even the same SID:

RMAN> duplicate target database to PROD nofilenamecheck;

Starting Duplicate Db at 28-MAR-13
using target database control file instead of recovery catalog
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=134 device type=DISK

contents of Memory Script:
{
   sql clone "create spfile from memory";
}
executing Memory Script

sql statement: create spfile from memory

contents of Memory Script:
{
   shutdown clone immediate;
   startup clone nomount;
}
executing Memory Script

Oracle instance shut down

connected to auxiliary database (not started)
Oracle instance started

Total System Global Area    1686925312 bytes

Fixed Size                     2213976 bytes
Variable Size               1006634920 bytes
Database Buffers             671088640 bytes
Redo Buffers                   6987776 bytes

contents of Memory Script:
{
   sql clone "alter system set  db_name = 
 ''PROD'' comment=
 ''Modified by RMAN duplicate'' scope=spfile";
   sql clone "alter system set  db_unique_name = 
 ''PROD'' comment=
 ''Modified by RMAN duplicate'' scope=spfile";
   shutdown clone immediate;
   startup clone force nomount
   restore clone primary controlfile;
   alter clone database mount;
}
executing Memory Script

sql statement: alter system set  db_name =  ''PROD'' comment= ''Modified by RMAN duplicate'' scope=spfile

sql statement: alter system set  db_unique_name =  ''PROD'' comment= ''Modified by RMAN duplicate'' scope=spfile

Oracle instance shut down

Oracle instance started

Total System Global Area    1686925312 bytes

Fixed Size                     2213976 bytes
Variable Size               1006634920 bytes
Database Buffers             671088640 bytes
Redo Buffers                   6987776 bytes

Starting restore at 28-MAR-13
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=134 device type=DISK

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: restoring control file
channel ORA_AUX_DISK_1: reading from backup piece /opt/app/oracle/flash_recovery_area/PROD/autobackup/2013_03_28/o1_mf_s_811248755_8o84q3cz_.bkp
channel ORA_AUX_DISK_1: piece handle=/opt/app/oracle/flash_recovery_area/PROD/autobackup/2013_03_28/o1_mf_s_811248755_8o84q3cz_.bkp tag=TAG20130328T105235
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:03
output file name=/opt/app/oracle/oradata/PROD/control01.ctl
output file name=/opt/app/oracle/flash_recovery_area/PROD/control02.ctl
Finished restore at 28-MAR-13

database mounted

contents of Memory Script:
{
   set until scn  730305;
   set newname for datafile  1 to 
 "/opt/app/oracle/oradata/PROD/system01.dbf";
   set newname for datafile  2 to 
 "/opt/app/oracle/oradata/PROD/sysaux01.dbf";
   set newname for datafile  3 to 
 "/opt/app/oracle/oradata/PROD/undotbs01.dbf";
   set newname for datafile  4 to 
 "/opt/app/oracle/oradata/PROD/users01.dbf";
   restore
   clone database
   ;
}
executing Memory Script

executing command: SET until clause

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

Starting restore at 28-MAR-13
using channel ORA_AUX_DISK_1

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00001 to /opt/app/oracle/oradata/PROD/system01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00002 to /opt/app/oracle/oradata/PROD/sysaux01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00003 to /opt/app/oracle/oradata/PROD/undotbs01.dbf
channel ORA_AUX_DISK_1: restoring datafile 00004 to /opt/app/oracle/oradata/PROD/users01.dbf
channel ORA_AUX_DISK_1: reading from backup piece /opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/o1_mf_nnndf_TAG20130328T105208_8o84p8vq_.bkp
channel ORA_AUX_DISK_1: piece handle=/opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/o1_mf_nnndf_TAG20130328T105208_8o84p8vq_.bkp tag=TAG20130328T105208
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:01:55
Finished restore at 28-MAR-13

contents of Memory Script:
{
   switch clone datafile all;
}
executing Memory Script

datafile 1 switched to datafile copy
input datafile copy RECID=1 STAMP=811262217 file name=/opt/app/oracle/oradata/PROD/system01.dbf
datafile 2 switched to datafile copy
input datafile copy RECID=2 STAMP=811262217 file name=/opt/app/oracle/oradata/PROD/sysaux01.dbf
datafile 3 switched to datafile copy
input datafile copy RECID=3 STAMP=811262217 file name=/opt/app/oracle/oradata/PROD/undotbs01.dbf
datafile 4 switched to datafile copy
input datafile copy RECID=4 STAMP=811262217 file name=/opt/app/oracle/oradata/PROD/users01.dbf

contents of Memory Script:
{
   set until scn  730305;
   recover
   clone database
    delete archivelog
   ;
}
executing Memory Script

executing command: SET until clause

Starting recover at 28-MAR-13
using channel ORA_AUX_DISK_1

starting media recovery

channel ORA_AUX_DISK_1: starting archived log restore to default destination
channel ORA_AUX_DISK_1: restoring archived log
archived log thread=1 sequence=51
channel ORA_AUX_DISK_1: reading from backup piece /opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/o1_mf_annnn_TAG20130328T105233_8o84q231_.bkp
channel ORA_AUX_DISK_1: piece handle=/opt/app/oracle/flash_recovery_area/PROD/backupset/2013_03_28/o1_mf_annnn_TAG20130328T105233_8o84q231_.bkp tag=TAG20130328T105233
channel ORA_AUX_DISK_1: restored backup piece 1
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:01
archived log file name=/opt/app/oracle/flash_recovery_area/PROD/archivelog/2013_03_28/o1_mf_1_51_8o8kvv94_.arc thread=1 sequence=51
channel clone_default: deleting archived log(s)
archived log file name=/opt/app/oracle/flash_recovery_area/PROD/archivelog/2013_03_28/o1_mf_1_51_8o8kvv94_.arc RECID=5 STAMP=811262219
media recovery complete, elapsed time: 00:00:00
Finished recover at 28-MAR-13

contents of Memory Script:
{
   shutdown clone immediate;
   startup clone nomount;
   sql clone "alter system set  db_name = 
 ''PROD'' comment=
 ''Reset to original value by RMAN'' scope=spfile";
   sql clone "alter system reset  db_unique_name scope=spfile";
   shutdown clone immediate;
   startup clone nomount;
}
executing Memory Script

database dismounted
Oracle instance shut down

connected to auxiliary database (not started)
Oracle instance started

Total System Global Area    1686925312 bytes

Fixed Size                     2213976 bytes
Variable Size               1006634920 bytes
Database Buffers             671088640 bytes
Redo Buffers                   6987776 bytes

sql statement: alter system set  db_name =  ''PROD'' comment= ''Reset to original value by RMAN'' scope=spfile

sql statement: alter system reset  db_unique_name scope=spfile

Oracle instance shut down

connected to auxiliary database (not started)
Oracle instance started

Total System Global Area    1686925312 bytes

Fixed Size                     2213976 bytes
Variable Size               1006634920 bytes
Database Buffers             671088640 bytes
Redo Buffers                   6987776 bytes
sql statement: CREATE CONTROLFILE REUSE SET DATABASE "PROD" RESETLOGS ARCHIVELOG 
  MAXLOGFILES     16
  MAXLOGMEMBERS      3
  MAXDATAFILES      100
  MAXINSTANCES     8
  MAXLOGHISTORY      292
 LOGFILE
  GROUP  1  SIZE 50 M ,
  GROUP  2  SIZE 50 M ,
  GROUP  3  SIZE 50 M 
 DATAFILE
  '/opt/app/oracle/oradata/PROD/system01.dbf'
 CHARACTER SET WE8MSWIN1252


contents of Memory Script:
{
   set newname for tempfile  1 to 
 "/opt/app/oracle/oradata/PROD/temp01.dbf";
   switch clone tempfile all;
   catalog clone datafilecopy  "/opt/app/oracle/oradata/PROD/sysaux01.dbf", 
 "/opt/app/oracle/oradata/PROD/undotbs01.dbf", 
 "/opt/app/oracle/oradata/PROD/users01.dbf";
   switch clone datafile all;
}
executing Memory Script

executing command: SET NEWNAME

renamed tempfile 1 to /opt/app/oracle/oradata/PROD/temp01.dbf in control file

cataloged datafile copy
datafile copy file name=/opt/app/oracle/oradata/PROD/sysaux01.dbf RECID=1 STAMP=811262238
cataloged datafile copy
datafile copy file name=/opt/app/oracle/oradata/PROD/undotbs01.dbf RECID=2 STAMP=811262238
cataloged datafile copy
datafile copy file name=/opt/app/oracle/oradata/PROD/users01.dbf RECID=3 STAMP=811262238

datafile 2 switched to datafile copy
input datafile copy RECID=1 STAMP=811262238 file name=/opt/app/oracle/oradata/PROD/sysaux01.dbf
datafile 3 switched to datafile copy
input datafile copy RECID=2 STAMP=811262238 file name=/opt/app/oracle/oradata/PROD/undotbs01.dbf
datafile 4 switched to datafile copy
input datafile copy RECID=3 STAMP=811262238 file name=/opt/app/oracle/oradata/PROD/users01.dbf

contents of Memory Script:
{
   Alter clone database open resetlogs;
}
executing Memory Script

database opened
Finished Duplicate Db at 28-MAR-13

That' all.

Wednesday, March 27, 2013

How to recover corrupted data blocks using 'recover datafile' RMAN syntax on a system critical datafile

This post continues a series of previous posts related to the recovery process of corrupt data blocks:
1) How to recover contiguous corrupted data blocks using 'recover datafile' RMAN syntax;
2) How to recover sparse corrupted data blocks using 'recover datafile' RMAN syntax;

What does it happen when a corrupt block is related to SYSTEM datafile ?
Let's simulate this scenario:

[oracle@localhost trace]$ dd if=/dev/zero of=/home/oracle/app/oracle/oradata/orcl/system01.dbf bs=8k conv=notrunc seek=1000 count=5

DBVERIFY detects your SYSTEM datafile is corrupt. Five data blocks are no more available.

[oracle@localhost trace]$ dbv file=/home/oracle/app/oracle/oradata/orcl/system01.dbf blocksize=8192

DBVERIFY: Release 11.2.0.2.0 - Production on Tue Mar 19 09:01:17 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /home/oracle/app/oracle/oradata/orcl/system01.dbf
Page 1000 is marked corrupt
Corrupt block relative dba: 0x004003e8 (file 1, block 1000)
Completely zero block found during dbv: 

Page 1001 is marked corrupt
Corrupt block relative dba: 0x004003e9 (file 1, block 1001)
Completely zero block found during dbv: 

Page 1002 is marked corrupt
Corrupt block relative dba: 0x004003ea (file 1, block 1002)
Completely zero block found during dbv: 

Page 1003 is marked corrupt
Corrupt block relative dba: 0x004003eb (file 1, block 1003)
Completely zero block found during dbv: 

Page 1004 is marked corrupt
Corrupt block relative dba: 0x004003ec (file 1, block 1004)
Completely zero block found during dbv: 

DBVERIFY - Verification complete

Total Pages Examined         : 116608
Total Pages Processed (Data) : 73223
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 18058
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 11301
Total Pages Processed (Seg)  : 1
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 14021
Total Pages Marked Corrupt   : 5
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 14866073 (0.14866073)

Let's update the V$DATABASE_BLOCK_CORRUPTION view issuing a backup validate command.

RMAN> backup validate tablespace system;

Starting backup at 19-03-2013 09:02:43
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/home/oracle/app/oracle/oradata/orcl/system01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:35
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    FAILED 0              14021        116611          14866295  
  File Name: /home/oracle/app/oracle/oradata/orcl/system01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              73223           
  Index      0              18058           
  Other      5              11306           

validate found one or more corrupt blocks
See trace file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7493.trc for details
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2               
Control File OK     0              604 
Finished backup at 19-03-2013 09:03:21

The V$DATABASE_BLOCK_CORRUPTION view contains the list of corrupt blocks, all related to datafile number 1.

SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
         1       1000          5                  0 ALL ZERO

Let's try to recover those blocks.

RMAN> recover corruption list;

Starting recover at 19-03-2013 09:04:22
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00001
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_17/o1_mf_nnndf_TAG20130317T224547_8nfbjnn5_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_17/o1_mf_nnndf_TAG20130317T224547_8nfbjnn5_.bkp tag=TAG20130317T224547
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:02:26

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished recover at 19-03-2013 09:06:51

It worked. And the view doesn't contain any data.

SQL> select * from v$database_block_corruption;

no rows selected

So the recovery process of a corrupt block of the SYSTEM tablespace is the same and there is no need to bounce the instance.
Next step will be about how to perform a recovery when the first block of a datafile, containing the datafile header, becomes corrupt.
That's all and happy birthday to me :)

Monday, March 25, 2013

How to recover sparse corrupted data blocks using 'recover datafile' RMAN syntax

In this previous post we were able to recover contiguous corrupt blocks using the 'recover datafile' RMAN syntax.
But what about if we need to recover several sparse corrupt blocks ? Should we need to issue several different recover commands ? How can I recover a list of sparse corrupt blocks ?
Just have a look at this other example. First of all let's wipe out some blocks of datafile 11 (and in particular blocks number 3,4,5,7,8 and 10)

[oracle@localhost trace]$ dd if=/dev/zero of=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf bs=8k conv=notrunc seek=3 count=3
3+0 records in
3+0 records out
24576 bytes (25 kB) copied, 0.000780782 seconds, 31.5 MB/s
[oracle@localhost trace]$ dd if=/dev/zero of=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf bs=8k conv=notrunc seek=7 count=2
2+0 records in
2+0 records out
16384 bytes (16 kB) copied, 0.00026644 seconds, 61.5 MB/s
[oracle@localhost trace]$ dd if=/dev/zero of=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf bs=8k conv=notrunc seek=10 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000561284 seconds, 14.6 MB/s

The database currently doesn't know there are few corrupt blocks on datafile 11:

SQL> select * from  v$database_block_corruption;

no rows selected

DBVERIFY is able to find them easily.

[oracle@localhost trace]$ dbv file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf blocksize=8192

DBVERIFY: Release 11.2.0.2.0 - Production on Mon Mar 18 22:46:36 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /home/oracle/app/oracle/oradata/orcl/marcov01.dbf
Page 3 is marked corrupt
Corrupt block relative dba: 0x02c00003 (file 11, block 3)
Completely zero block found during dbv: 

Page 4 is marked corrupt
Corrupt block relative dba: 0x02c00004 (file 11, block 4)
Completely zero block found during dbv: 

Page 5 is marked corrupt
Corrupt block relative dba: 0x02c00005 (file 11, block 5)
Completely zero block found during dbv: 

Page 7 is marked corrupt
Corrupt block relative dba: 0x02c00007 (file 11, block 7)
Completely zero block found during dbv: 

Page 8 is marked corrupt
Corrupt block relative dba: 0x02c00008 (file 11, block 8)
Completely zero block found during dbv: 

Page 10 is marked corrupt
Corrupt block relative dba: 0x02c0000a (file 11, block 10)
Completely zero block found during dbv: 

DBVERIFY - Verification complete

Total Pages Examined         : 768
Total Pages Processed (Data) : 196
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 13
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 553
Total Pages Marked Corrupt   : 6
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 14853679 (0.14853679)

Just let's try to have a backup of tablespace MARCOV.

RMAN> backup tablespace marcov;

Starting backup at 18-03-2013 22:41:59
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00011 name=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf
channel ORA_DISK_1: starting piece 1 at 18-03-2013 22:41:59
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 03/18/2013 22:42:07
ORA-19566: exceeded limit of 0 corrupt blocks for file /home/oracle/app/oracle/oradata/orcl/marcov01.dbf

It fails, but as you can see it stops at the first corrupt block, logging just that single one.

SQL> select * from  v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
        11          3          1                  0 ALL ZERO

To check and have a completed list of corrupt blocks you should issue a backup validate command:

RMAN> backup validate tablespace marcov;

Starting backup at 18-03-2013 22:57:52
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00011 name=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:04
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
11   FAILED 0              553          768             14853679  
  File Name: /home/oracle/app/oracle/oradata/orcl/marcov01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              196             
  Index      0              0               
  Other      6              19              

validate found one or more corrupt blocks
See trace file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7493.trc for details
Finished backup at 18-03-2013 22:57:56

Now the V$DATABASE_BLOCK_CORRUPTION view is appropriately filled with all corrupt blocks. There are three contiguous currupt blocks starting from the third, two contiguous corrupt blocks starting from the seventh and the corrupt block number 10.

SQL> select * from  v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
        11         10          1                  0 ALL ZERO
        11          7          2                  0 ALL ZERO
        11          3          3                  0 ALL ZERO

Here is the syntax if you want to recover all of them using a single command:

RMAN> recover datafile 11 block 3 to 5,7 to 8,10; 

Starting recover at 18-03-2013 23:00:54
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00011
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp tag=TAG20130318T072836
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:02

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished recover at 18-03-2013 23:01:02

When the recover finishes the V$DATABASE_BLOCK_CORRUPTION view doesn't contain any rows:

SQL> select * from  v$database_block_corruption;

no rows selected

And what happens if we have few sparse corrupt blocks on several datafiles ?
Is there a way to use a single command to recover all of them ?
Let's simulate this latest scenario, resetting the same blocks on datafile number 11 and five blocks of datafile number 12 starting from block number 3:

[oracle@localhost trace]$ dd if=/dev/zero of=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf bs=8k conv=notrunc seek=3 count=3
3+0 records in
3+0 records out
24576 bytes (25 kB) copied, 0.000856502 seconds, 28.7 MB/s
[oracle@localhost trace]$ dd if=/dev/zero of=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf bs=8k conv=notrunc seek=7 count=2
2+0 records in
2+0 records out
16384 bytes (16 kB) copied, 0.000239112 seconds, 68.5 MB/s
[oracle@localhost trace]$ dd if=/dev/zero of=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf bs=8k conv=notrunc seek=10 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.00676381 seconds, 1.2 MB/s
[oracle@localhost trace]$ dd if=/dev/zero of=/home/oracle/app/oracle/oradata/orcl/test01.dbf bs=8k conv=notrunc seek=3 count=5
5+0 records in
5+0 records out
40960 bytes (41 kB) copied, 0.000839468 seconds, 48.8 MB/s

Let's fill the V$DATABASE_BLOCK_CORRUPTION view using the backup validate command on the two tablespaces:

RMAN> backup validate tablespace marcov,"TEST";

Starting backup at 18-03-2013 23:08:37
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00011 name=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf
input datafile file number=00012 name=/home/oracle/app/oracle/oradata/orcl/test01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
11   FAILED 0              553          768             14853679  
  File Name: /home/oracle/app/oracle/oradata/orcl/marcov01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              196             
  Index      0              0               
  Other      6              19              

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
12   FAILED 0              113          128             14856079  
  File Name: /home/oracle/app/oracle/oradata/orcl/test01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              5               
  Index      0              0               
  Other      5              10              

validate found one or more corrupt blocks
See trace file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7493.trc for details
Finished backup at 18-03-2013 23:08:40

Here is the list of corrupt blocks on different datafiles:

SQL> select * from  v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
        11         10          1                  0 ALL ZERO
        11          7          2                  0 ALL ZERO
        11          3          3                  0 ALL ZERO
        12          3          5                  0 ALL ZERO

You can use the following command to let RMAN try to recover all the blocks listed into the V$DATABASE_BLOCK_CORRUPTION view.

RMAN> recover corruption list;

Starting recover at 18-03-2013 23:09:53
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00011
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp tag=TAG20130318T072836
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished recover at 18-03-2013 23:10:03

The V$DATABASE_BLOCK_CORRUPTION is empty again.

SQL> select * from  v$database_block_corruption;

no rows selected

Next post will be about how to recover corrupt blocks on SYSTEM tablespace. That's all.

Wednesday, March 20, 2013

How to recover contiguous corrupted data blocks using 'recover datafile' RMAN syntax

In this post I want to simulate a corrupt block recovery: honestly it happened to me only two or three times several years ago and it's generally caused by the I/O subsystem or because of software bugs.
Naturally you should consider this kind of recovery when you have few corrupt blocks in a large datafile otherwise you should consider other familiar kind of recovery.
To simulate a recover of corrupt blocks I begin to create first a new tablespace formed by one small datafile, a user, a table and some rows:

SQL> create tablespace marcov datafile '/home/oracle/app/oracle/oradata/orcl/marcov01.dbf' size 1M autoextend on next 5M maxsize 50M;

Tablespace created.

SQL> DROP USER MARCOV CASCADE;
DROP USER MARCOV CASCADE
          *
ERROR at line 1:
ORA-01918: user 'MARCOV' does not exist


SQL> CREATE USER MARCOV
  2  IDENTIFIED BY MARCOV
  3  DEFAULT TABLESPACE MARCOV
  4  TEMPORARY TABLESPACE TEMP
  5  PROFILE DEFAULT
  6  ACCOUNT UNLOCK;

User created.

SQL> GRANT CONNECT, CREATE TABLE, CREATE SESSION TO MARCOV;

Grant succeeded.

SQL> ALTER USER MARCOV QUOTA UNLIMITED ON MARCOV;

User altered.

SQL> CREATE TABLE MARCOV.T1 (a number, b char(1000));

Table created.

SQL> insert into MARCOV.T1 select level, dummy from dual
  2  connect by level <= 1000;

1000 rows created.

SQL> commit;

Commit complete.

Let's update the statistics on this new table...

SQL> BEGIN
  2  SYS.DBMS_STATS.GATHER_TABLE_STATS (
  3  OwnName        => 'MARCOV'
  4  ,TabName        => 'T1'
  5  ,Estimate_Percent  => 1
  6  ,Method_Opt        => 'FOR ALL COLUMNS SIZE 1'
  7  ,Degree            => 4
  8  ,Cascade           => TRUE
  9  ,No_Invalidate     => FALSE);
 10  END;
 11  /

PL/SQL procedure successfully completed.

... and let's see how many blocks this segment has and how many are used:

SQL> SELECT NUM_ROWS, BLOCKS
  2  FROM SYS.DBA_TABLES
  3  WHERE TABLE_NAME = 'T1'
  4  AND OWNER = 'MARCOV';

  NUM_ROWS     BLOCKS
---------- ----------
      1000        244

SQL> select header_block, blocks, extents from dba_segments where segment_name='T1';

HEADER_BLOCK     BLOCKS    EXTENTS
------------ ---------- ----------
          10        256         17

From a sqlplus session I want to dump some blocks of the new datafile using the following command:

SQL> alter system dump datafile '/home/oracle/app/oracle/oradata/orcl/marcov01.dbf' block min 1 block max 10;

System altered.

and few of them are reported here (I have omitted several lines):

vi /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7255.trc
...
buffer tsn: 40 rdba: 0x02c00003 (11/3)
scn: 0x0000.00e2a62a seq: 0x01 flg: 0x04 tail: 0xa62a1e01
frmt: 0x02 chkval: 0x4dcc type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
buffer tsn: 40 rdba: 0x02c00004 (11/4)
scn: 0x0000.00e2a047 seq: 0x01 flg: 0x04 tail: 0xa0471e01
frmt: 0x02 chkval: 0x823c type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
buffer tsn: 40 rdba: 0x02c00005 (11/5)
scn: 0x0000.00e2a049 seq: 0x01 flg: 0x04 tail: 0xa0491e01
frmt: 0x02 chkval: 0xc235 type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
buffer tsn: 40 rdba: 0x02c00006 (11/6)
scn: 0x0000.00e2a04b seq: 0x01 flg: 0x04 tail: 0xa04b1e01
frmt: 0x02 chkval: 0x022e type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...

Running DBVERIFY on the datafile no error or corrupt block is detected:

[oracle@localhost trace]$ dbv file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf blocksize=8192

DBVERIFY: Release 11.2.0.2.0 - Production on Mon Mar 18 07:29:48 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /home/oracle/app/oracle/oradata/orcl/marcov01.dbf

DBVERIFY - Verification complete

Total Pages Examined         : 768
Total Pages Processed (Data) : 196
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 19
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 553
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 14853855 (0.14853855)

Using the following command, dd resets two blocks of the datafile starting from the fourth.

[oracle@localhost trace]$ dd if=/dev/zero of=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf bs=8k conv=notrunc seek=4 count=2
2+0 records in
2+0 records out
16384 bytes (16 kB) copied, 0.00047551 seconds, 34.5 MB/s

Let's dump again the same few blocks of the datafile ...

SQL> alter system dump datafile '/home/oracle/app/oracle/oradata/orcl/marcov01.dbf' block min 1 block max 10;

System altered.

... and compare them with the previous blocks. Blocks marked as (11/4) and (11/5) are totally reset and currently identified as (0/0):

vi /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7255.trc
...
buffer tsn: 40 rdba: 0x02c00003 (11/3)
scn: 0x0000.00e2a62a seq: 0x01 flg: 0x04 tail: 0xa62a1e01
frmt: 0x02 chkval: 0x4dcc type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
buffer tsn: 40 rdba: 0x00000000 (0/0)
scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001
frmt: 0x02 chkval: 0x0000 type: 0x00=unknown
Hex dump of block: st=0, typ_found=0
...
buffer tsn: 40 rdba: 0x00000000 (0/0)
scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001
frmt: 0x02 chkval: 0x0000 type: 0x00=unknown
Hex dump of block: st=0, typ_found=0
...
buffer tsn: 40 rdba: 0x02c00006 (11/6)
scn: 0x0000.00e2a04b seq: 0x01 flg: 0x04 tail: 0xa04b1e01
frmt: 0x02 chkval: 0x022e type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...

Now DBVERIFY is able to find two corrupt blocks on the datafile:

[oracle@localhost trace]$ dbv file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf blocksize=8192

DBVERIFY: Release 11.2.0.2.0 - Production on Mon Mar 18 07:49:40 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /home/oracle/app/oracle/oradata/orcl/marcov01.dbf
Page 4 is marked corrupt
Corrupt block relative dba: 0x02c00004 (file 11, block 4)
Completely zero block found during dbv: 

Page 5 is marked corrupt
Corrupt block relative dba: 0x02c00005 (file 11, block 5)
Completely zero block found during dbv: 

DBVERIFY - Verification complete

Total Pages Examined         : 768
Total Pages Processed (Data) : 196
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 17
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 553
Total Pages Marked Corrupt   : 2
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 14853855 (0.14853855)

During a backup if RMAN detects corrupt blocks it terminates the command: the default threshold of total number of physical and logical corruptions permitted on a file is zero, so no corrupt blocks are tolerated.
MAXCORRUPT is the setting you can use to modify this behaviour and tolerate more than one corrupt block. Have a look at the following example, considering that I have two corrupt blocks. When I specify 1 as maxcorrupt setting my backup still fails:

RMAN> run {
2> set maxcorrupt for datafile 11 to 1;
3> backup datafile 11;
4> }

executing command: SET MAX CORRUPT

Starting backup at 18-03-2013 09:01:05
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00011 name=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf
channel ORA_DISK_1: starting piece 1 at 18-03-2013 09:01:05
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 03/18/2013 09:01:06
ORA-19566: exceeded limit of 1 corrupt blocks for file /home/oracle/app/oracle/oradata/orcl/marcov01.dbf

When I specify to tolerate at most two corrupt blocks the backup command is able to complete successfully (backup piece: o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp). It should be clear anyway that I was able to complete a backup including a datafile containing two corrupt blocks!!!

RMAN> run {
2> set maxcorrupt for datafile 11 to 2;
3> backup datafile 11;
4> }

executing command: SET MAX CORRUPT

Starting backup at 18-03-2013 09:01:32
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00011 name=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf
channel ORA_DISK_1: starting piece 1 at 18-03-2013 09:01:33
channel ORA_DISK_1: finished piece 1 at 18-03-2013 09:01:34
piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp tag=TAG20130318T090132 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 18-03-2013 09:01:34

Starting Control File and SPFILE Autobackup at 18-03-2013 09:01:34
piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/autobackup/2013_03_18/o1_mf_s_810378094_8ngglyk1_.bkp comment=NONE
Finished Control File and SPFILE Autobackup at 18-03-2013 09:01:35

After RMAN detects these kind of problems it writes which blocks in a datafile were marked corrupt on the V$DATABASE_BLOCK_CORRUPTION view and, until these corrupt blocks are not repaired, the rows remain in the view. Datafile 11 seems to have 2 corrupt blocks starting from block number 4:

SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
        11          4          2                  0 ALL ZERO

On the alert log you can find the following errors:

...
Mon Mar 18 09:01:05 2013
Hex dump of (file 11, block 4) in trace file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7493.trc
Corrupt block relative dba: 0x02c00004 (file 11, block 4)
Completely zero block found during backing up datafile
Reread of blocknum=4, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=4, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=4, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=4, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=4, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Hex dump of (file 11, block 5) in trace file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7493.trc
Corrupt block relative dba: 0x02c00005 (file 11, block 5)
Completely zero block found during backing up datafile
Reread of blocknum=5, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=5, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=5, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=5, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Reread of blocknum=5, file=/home/oracle/app/oracle/oradata/orcl/marcov01.dbf. found same corrupt data
Deleted Oracle managed file /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T090105_8nggl1fw_.bkp
Checker run found 1 new persistent data failures
...

To perform a recovery of these blocks the database has to be in ARCHIVELOG mode, could be either in MOUNT or OPEN mode and it is not necessary to put the affected datafiles offline. Let's try to recover first one block, specifying the following command:

RMAN> recover datafile 11 block 4;

Starting recover at 18-03-2013 09:37:06
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00011
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp tag=TAG20130318T090132
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01
failover to previous backup

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00011
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp tag=TAG20130318T072836
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 18-03-2013 09:37:11

As you can see RMAN is able to look in a older backup untill it find a useful backup piece. The first backup piece used by RMAN was indeed o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp, containing the same corrupt blocks because I previously used the set maxcorrupt option. Only one block was recovered so the V$DATABASE_BLOCK_CORRUPTION view is still populated, but with a different information. Only one block seems to be corrupt on datafile 11:

SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
        11          5          1                  0 ALL ZERO

Let's check again the dump of the datafile.

SQL> alter system dump datafile '/home/oracle/app/oracle/oradata/orcl/marcov01.dbf' block min 1 block max 10; 

System altered.

The block number (11/4) now contains useful and valid information, whereas block number (11/5) is still corrupt and identified by (0/0):

vi /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7255.trc
...
buffer tsn: 40 rdba: 0x02c00004 (11/4)
scn: 0x0000.00e2a047 seq: 0x01 flg: 0x04 tail: 0xa0471e01
frmt: 0x02 chkval: 0x823c type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...
buffer tsn: 40 rdba: 0x00000000 (0/0)
scn: 0x0000.00000000 seq: 0x01 flg: 0x01 tail: 0x00000001
frmt: 0x02 chkval: 0x0000 type: 0x00=unknown
Hex dump of block: st=0, typ_found=0
...
buffer tsn: 40 rdba: 0x02c00006 (11/6)
scn: 0x0000.00e2a04b seq: 0x01 flg: 0x04 tail: 0xa04b1e01
frmt: 0x02 chkval: 0x022e type: 0x1e=KTFB Bitmapped File Space Bitmap
Hex dump of block: st=0, typ_found=1
...

Let's try to recover the block number 5:

RMAN> recover datafile 11 block 5;

Starting recover at 18-03-2013 09:42:35
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00011
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T090132_8ngglx76_.bkp tag=TAG20130318T090132
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:00
failover to previous backup

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00011
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/flash_recovery_area/ORCL/backupset/2013_03_18/o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp tag=TAG20130318T072836
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:00

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished recover at 18-03-2013 09:42:40

Again RMAN "failover to previous backup", it is able to recover block number 5 using the next available backup piece (o1_mf_nnndf_TAG20130318T072836_8ng94o0k_.bkp) and the V$DATABASE_BLOCK_CORRUPTION is finally empty.

SQL> select * from  v$database_block_corruption;

no rows selected

Next post will be about how to recover sparse corrupt blocks.

That's all.

Friday, March 15, 2013

On PRKP-1033, CRS-0217, PRKP-1030 and CRS-0215 errors while using 'srvctl relocate service' command

Today I had to relocate a service named MYPRD02 from one node to another, but it didn't work as usual and something new happened.
This production installation is an Oracle RAC environment formed by three nodes: it is based on Oracle Database Server 10gR2 patched to 10.2.0.4 for x86_64 Linux architecture.

All my clustered resources were online like the service I wanted to relocate.

[oracle@myrac01 ~]$ crsstat
HA Resource                                   Target     State             
-----------                                   ------     -----             
...
ora.MYRAC.MYPRD02.MYRAC1.srv                  ONLINE     ONLINE           
ora.MYRAC.MYPRD02.cs                          ONLINE     ONLINE      
...

I executed the following relocate service command using srvctl cli but I received the error I couldn't relocate that resource.

oracle@myrac01 ~]$ srvctl relocate service -d MYRAC -s MYPRD02 -i MYRAC1 -t MYRAC2
PRKP-1033 : Cannot relocate service MYPRD02 from instance MYRAC1 to instance MYRAC2.
CRS-0217: Could not relocate resource ''.

After the command was unsuccessfully completed my crsstat command showed the following new states for the same service:

[oracle@myrac01 ~]$ crsstat
HA Resource                                   Target     State             
-----------                                   ------     -----             
...
ora.MYRAC.MYPRD02.MYRAC1.srv                  ONLINE     OFFLINE           
ora.MYRAC.MYPRD02.cs                          OFFLINE     OFFLINE      
...

From srvctl cli interface my service was no more available...

[oracle@myrac01 ~]$ srvctl status database -d MYRAC -v
Instance MYRAC1 is running on node myrac01 with online services MYBI 
Instance MYRAC2 is running on node myrac02 with online services MYPRD01 
Instance MYRAC3 is running on node myrac03

...but querying my database I obtained:

SQL> select inst_id, name from gv$active_services  order by service_id ;

   INST_ID NAME
---------- ----------------------------------------------------------------
  1 SYS$BACKGROUND
  3 SYS$BACKGROUND
  2 SYS$BACKGROUND
  2 SYS$USERS
  1 SYS$USERS
  3 SYS$USERS
  3 MYRACXDB
  1 MYRACXDB
  2 MYRACXDB
  1 MYRAC
  3 MYRAC
  2 MYRAC
  2 MYPRD01
  1 MYPRD02
  1 MYBI

15 rows selected.

So from srvctl cli a new session trying to connect to MYPRD02 service shouldn't be able to estabilish a connection with the database, whereas looking at the GV$ACTIVE_SERVICES view the same service was up and active to process new incoming requests. Some errors were logged on crsd log located at $CRS_HOME/log//crsd/:

[oracle@myrac01 ~]$ cd $CRS_HOME/log/myrac01/crsd/
[oracle@myrac01 crsd]$ pwd
/opt/crs/oracle/product/10.2.0/crs/log/myrac01/crsd/
[oracle@myrac01 crsd]$ ls
crsd.log

[oracle@myrac01 crsd]$ vi crsd.log
...
2013-03-11 09:31:14.249: [  CRSRES][1494907200]0StopResource: setting CLI values
2013-03-11 09:31:14.321: [  CRSRES][1494907200]0Attempting to stop `ora.MYRAC.MYPRD02.MYRAC1.srv` on member `myrac01`
2013-03-11 09:31:15.350: [  CRSRES][1494907200]0Stop of `ora.MYRAC.MYPRD02.MYRAC1.srv` on member `myrac01` succeeded.
2013-03-11 09:31:15.396: [  CRSRES][1494907200]0Attempting to start `ora.MYRAC.MYPRD02.MYRAC1.srv` on member `myrac02`
2013-03-11 09:31:15.945: [  CRSRES][1494907200]0Start of `ora.MYRAC.MYPRD02.MYRAC1.srv` on member `myrac02` failed.
...

From that log my crs was able to stop the resource but then failed to relocate and start it on another node, in this case the second.
I tried to start it using the following command, trying first on second node and then on third node, but always obtaining the same errors:

[oracle@myrac01 ~]$ srvctl start service -d MYRAC -s MYPRD02 -i MYRAC2
PRKP-1030 : Failed to start the service MYPRD02.
CRS-0215: Could not start resource 'ora.MYRAC.MYPRD02.MYRAC1.srv'.

[oracle@myrac01 ~]$ srvctl status database -d MYRAC -v
Instance MYRAC1 is running on node myrac01 with online services MYBI 
Instance MYRAC2 is running on node myrac02 with online services MYPRD01 
Instance MYRAC3 is running on node myrac03

[oracle@myrac01 ~]$ srvctl start service -d MYRAC -s MYPRD02 -i MYRAC3
PRKP-1030 : Failed to start the service MYPRD02.
CRS-0215: Could not start resource 'ora.MYRAC.MYPRD02.MYRAC1.srv'.

Again on crsd log the following unsuccessfully attempts to start MYPRD02 service were logged:

2013-03-11 09:56:28.043: [  CRSRES][1494907200]0Attempting to start `ora.MYRAC.MYPRD02.MYRAC1.srv` on member `myrac02`
2013-03-11 09:56:28.319: [  CRSRES][1494907200]0Start of `ora.MYRAC.MYPRD02.MYRAC1.srv` on member `myrac02` failed.
2013-03-11 10:01:31.198: [  CRSRES][1494907200]0Attempting to start `ora.MYRAC.MYPRD02.MYRAC1.srv` on member `myrac03`
2013-03-11 10:01:31.454: [  CRSRES][1494907200]0Start of `ora.MYRAC.MYPRD02.MYRAC1.srv` on member `myrac03` failed.

I found an interesting log looking into the database log directory of the second node.
Into imon_MYRAC.log file located at /opt/app/oracle/product/10.2.0/db_1/log/myrac02/racg directory of the second node the following errors were logged, stating that the service MYPRD02 was already running. Those errors were logged at 09:56:28.173 that is few centiseconds later the srvtcl start service command failed to start from node one the service MYPRD02 on node two executed on 2013-03-11 at time 09:56:28.043.

2013-03-11 09:56:28.173: [    RACG][1246857536] [7383][1246857536][ora.MYRAC.MYPRD02.MYRAC1.srv]: CLSR-0002: Oracle error encountered while executing clsrcsnstartsrv : execute2

2013-03-11 09:56:28.173: [    RACG][1246857536] [7383][1246857536][ora.MYRAC.MYPRD02.MYRAC1.srv]: ORA-44305: service MYPRD02 is running
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86
ORA-06512: at "SYS.DBMS_SERVICE", line 444
ORA-06512: at "SYS.DBMS_SERVICE", line 365
ORA-06512: at line 1

The same errors were logged on the third node looking into imon_MYRAC.log file located at /opt/app/oracle/product/10.2.0/db_1/log/myrac03/racg directory. The srvctl start service command was executed at 10:01:31.198 from the first node and the errors were logged on the third node at 10:01:31.325:

2013-03-11 10:01:31.325: [    RACG][1281046848] [7439][1281046848][ora.MYRAC.MYPRD02.MYRAC1.srv]: CLSR-0002: Oracle error encountered while executing clsrcsnstartsrv : execute2
2013-03-11 10:01:31.325: [    RACG][1281046848] [7439][1281046848][ora.MYRAC.MYPRD02.MYRAC1.srv]: ORA-44305: service MYPRD02 is running
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86
ORA-06512: at "SYS.DBMS_SERVICE", line 444
ORA-06512: at "SYS.DBMS_SERVICE", line 365
ORA-06512: at line 1

I was able anyway to manually estabilish a connection to that service so the GV$ACTIVE_SERVICES view was reliable, while srvctl cli was misleading and, most important of all, no one was experiencing any kind of problems because it was performed during a planned operation.
On My Oracle Support I found two interesting notes that could fit on my specific errors:
10.2 Database Service Not Starting With Errors: CRS-1006, CRS-215, ORA-44305 [ID 752366.1] suggesting to apply a patch and 'Srvctl Start Service' Fails With PRKP-1030, CRS-0215 And ORA-1003, CLSR-0002 In Database Imon Logs [ID 818997.1] suggesting to kill racgimon process or reboot the instance.
So I first tried to kill racgimon process on first node:

[oracle@myrac01 crsd]$ ps -ef|grep racgimon
oracle    7405     1  0 Feb26 ?        00:03:17 /opt/app/oracle/product/10.2.0/db_1/bin/racgimon startd MYRAC
[oracle@myrac01 crsd]$ kill -9 7405

Then I executed again the srvctl start service command and finally I was able to start the service:

[oracle@myrac01 crsd]$ srvctl start service -d MYRAC -s MYPRD02 -i MYRAC2
[oracle@myrac01 ~]$ srvctl status database -d MYRAC -v
Instance MYRAC1 is running on node myrac01 with online services MYBI 
Instance MYRAC2 is running on node myrac02 with online services MYPRD01,MYPRD02 
Instance MYRAC3 is running on node myrac03

That's all.

Monday, March 11, 2013

On maximizing data loading speeds

Few days ago my datawarehouse division asked to check and eventually speed up their nightly loading jobs:
they said already in the past different solutions were implemented by other collegues and would like to know if it was possible to implement or adopt new feature to solve their problems.
I asked which were the most important jobs to speed up and which Oracle schemas they used.
It was clear and evident those hundred of jobs were created years ago and no one wanted to manage them. Several shell scripts scheduled by Sun Grid Engine (now Oracle Grid Engine) using different sql scripts, several schemas and logging in many different ways:
of course any kind of documentation was available or ever written.
To be concise using find and grep utilities I was able to find those jobs and begin my analisys:
first of all I would like to know which tables are used and started to analyse them.
In the analysed schema no index was present and I also verified all tables used by those jobs were created using the NOLOGGING storage option and that was good.
At the same time I asked if they knew what NOLOGGING meant from a backup point of view:
those tables were dropped and recreated every night, but just because they were based on the NOLOGGING option, those operations were not saved in any redo log, any archived redo log, any backup and so there was no way to recover them in case of a media failure.
It was time to view inside the shell script... which called a sql script in which there were the following lines code:

DROP TABLE table_name PURGE;
CREATE TABLE table_name (...) ... NOLOGGING;
exec schema_name.INSERT_INTO_TABLE_GENERIC_PROCEDURE ('TABLE_NAME');

The extents allocated for that table were:

SQL> select count(*) from dba_extents
  2  where segment_name = 'TABLE_NAME'
  3  and owner = 'SCHEMA_NAME';

  COUNT(*)
----------
       256

SQL> select segment_name, sum(bytes)/(1024*1024) MB
  2  from dba_extents
  3  where segment_name = 'TABLE_NAME'
  4  and owner = 'SCHEMA_NAME'
  5  group by segment_name;

SEGMENT_NAME      MB
----------------------------------------- ----------
TABLE_NAME           2312.5

The above procedure created a dynamic insert getting all the necessary columns from a table properly configured. For my table the dynamic sql executed was something like:

INSERT INTO TABLE_NAME (col1, col2, ... , coln)
SELECT ... FROM TABLE_NAME@REMOTE_MACHINE;
COMMIT;

It was evident that:
1) all inserts were still logged:
When you set the NOLOGGING storage option during the creation of your table minimal redo information are generated during this creation whereas you can suppress some kinds of redo operations performed on it. If the database or tablespace were created using FORCE LOGGING mode, then any kind of direct path is logged, regardless of the logging settings of your tables. Every DML statement is not affected by your NOLOGGING setting: it will be always logged, generating it's usual redo information. It is possible to minimize redo information for direct path operations performed on the tables with NOLOGGING option: so the old script issued a common DML statement and it was naturally logged;
2) all the extents allocated for the tables needed to be recreated as well:
Every time a table is dropped and recreated Oracle has to de-allocate every allocated extents and then reallocate them one by one during the next massive INSERT statement;
3) I had to ask again whether or not datawarehouse division had completely understood the meaning and all the considerations of using NOLOGGING option in their backup strategies:
In NOLOGGING mode, data is modified with minimal logging (to mark new extents INVALID and to record dictionary changes). When applied during media recovery, the extent invalidation records mark a range of blocks as logically corrupt, because the redo data is not fully logged. Therefore, if you cannot afford to lose the database object and their new data, then you should take a backup after the NOLOGGING operation and avoid errors such as:

ORA-01578: ORACLE data block corrupted (file # %s, block # %s)
ORA-26040: Data block was loaded using the NOLOGGING option

To test all these considerations I added few lines to the old and new scripts. Let's consider just one of them:

-- OLD SCRIPT
set timi on
set serveroutput on
spool MV_ALL_020.log
declare
 l_redo01 number;
 l_redo02 number;
 begin

select b.value
 into l_redo01
 from v$statname a, v$mystat b
 where a.statistic# = b.statistic#
 and a.name = 'redo size';

execute immediate 'DROP TABLE SCHEMA_NAME.TABLE_NAME PURGE';
execute immediate 'CREATE TABLE SCHEMA_NAME.TABLE_NAME (...)';
INSERT INTO SCHEMA_NAME.TABLE_NAME
(COL1, COL2, ... , COLn) 
SELECT COL1, COL2, ... , COLn 
FROM REMOTE_SCHEMA_NAME.REMOTE_TABLE_NAME@REMOTE_DATABASE;

select b.value
 into l_redo02
 from v$statname a, v$mystat b
 where a.statistic# = b.statistic#
 and a.name = 'redo size';

dbms_output.put_line
 ( 'Redo generated: ' || to_char( (l_redo02-l_redo01), '999,999,999,999' ) );
end;
/
spool off
quit;

-- NEW SCRIPT
set timi on
set serveroutput on
spool MV_ALL_020_APPEND.log
declare
 l_redo01 number;
 l_redo02 number;
 begin

select b.value
 into l_redo01
 from v$statname a, v$mystat b
 where a.statistic# = b.statistic#
 and a.name = 'redo size';

execute immediate 'TRUNCATE TABLE SCHEMA_NAME.TABLE_NAME REUSE STORAGE';
INSERT /*+ APPEND */ INTO SCHEMA_NAME.TABLE_NAME
(COL1, COL2, ... , COLn) 
SELECT COL1, COL2, ... , COLn 
FROM REMOTE_SCHEMA_NAME.REMOTE_TABLE_NAME@REMOTE_DATABASE;

select b.value
 into l_redo02
 from v$statname a, v$mystat b
 where a.statistic# = b.statistic#
 and a.name = 'redo size';

dbms_output.put_line
 ( 'Redo generated: ' || to_char( (l_redo02-l_redo01), '999,999,999,999' ) );
end;
/
spool off
quit;

I had several scripts to test so I reported the spool log of just few of them here. The following is the command line executed to run one of them:

[sgeuser@sge02prdpom TEST]$ jobs
[1]+  Running                 sqlplus MARCOV/MARCOV@MYDB @MV_ALL_021.sql &

The following were the results of elapsed time of just 8 OLD scripts:

SPOOL_MV_ALL_020.log:Elapsed: 00:08:36.56
SPOOL_MV_ALL_021.log:Elapsed: 00:06:15.52
SPOOL_MV_ALL_022.log:Elapsed: 00:12:39.15
SPOOL_MV_ALL_023.log:Elapsed: 00:10:57.40
SPOOL_MV_ALL_024.log:Elapsed: 00:14:39.42
SPOOL_MV_ALL_025.log:Elapsed: 00:08:17.77
SPOOL_MV_ALL_026.log:Elapsed: 00:02:36.08
SPOOL_MV_ALL_027.log:Elapsed: 00:06:04.93

While the following are the results of elapsed time of NEW scripts:

SPOOL_MV_ALL_020_APPEND.log:Elapsed: 00:03:31.45
SPOOL_MV_ALL_021_APPEND.log:Elapsed: 00:02:49.26
SPOOL_MV_ALL_022_APPEND.log:Elapsed: 00:05:53.73
SPOOL_MV_ALL_023_APPEND.log:Elapsed: 00:05:47.56
SPOOL_MV_ALL_024_APPEND.log:Elapsed: 00:07:10.47
SPOOL_MV_ALL_025_APPEND.log:Elapsed: 00:04:38.26
SPOOL_MV_ALL_026_APPEND.log:Elapsed: 00:01:20.38
SPOOL_MV_ALL_027_APPEND.log:Elapsed: 00:02:45.18

The following were the results of redo size generated of the 8 OLD scripts:

SPOOL_MV_ALL_020.log:Redo generated:    2,388,254,036
SPOOL_MV_ALL_021.log:Redo generated:    2,024,760,520
SPOOL_MV_ALL_022.log:Redo generated:    4,192,424,864
SPOOL_MV_ALL_023.log:Redo generated:    3,504,667,421
SPOOL_MV_ALL_024.log:Redo generated:    4,825,084,628
SPOOL_MV_ALL_025.log:Redo generated:    2,784,744,144
SPOOL_MV_ALL_026.log:Redo generated:      727,602,864
SPOOL_MV_ALL_027.log:Redo generated:    1,881,049,084

While the following are the results of redo size generated of NEW scripts:

SPOOL_MV_ALL_020_APPEND.log:Redo generated:        1,356,192
SPOOL_MV_ALL_021_APPEND.log:Redo generated:        1,040,544
SPOOL_MV_ALL_022_APPEND.log:Redo generated:        1,805,952
SPOOL_MV_ALL_023_APPEND.log:Redo generated:        1,712,002
SPOOL_MV_ALL_024_APPEND.log:Redo generated:        1,996,384
SPOOL_MV_ALL_025_APPEND.log:Redo generated:        1,295,352
SPOOL_MV_ALL_026_APPEND.log:Redo generated:          599,672
SPOOL_MV_ALL_027_APPEND.log:Redo generated:          986,140

And finally the percentage comparison between OLD and NEW scripts in terms of elapsed time and redo size:

job 020's elapsed time was reduced by 59.1 %
job 021's elapsed time was reduced by 54.9 %
job 022's elapsed time was reduced by 53.5 %
job 023's elapsed time was reduced by 47.2 %
job 024's elapsed time was reduced by 51.1 %
job 025's elapsed time was reduced by 44.1 %
job 026's elapsed time was reduced by 48.7 %
job 027's elapsed time was reduced by 54.7 %

job 020's redo size was reduced by 99.943 %
job 021's redo size was reduced by 99.949 %
job 022's redo size was reduced by 99.957 %
job 023's redo size was reduced by 99.951 %
job 024's redo size was reduced by 99.959 %
job 025's redo size was reduced by 99.953 %
job 026's redo size was reduced by 99.918 %
job 027's redo size was reduced by 99.948 %

With these few expedients I was able to successfully run the nightly jobs inside a suitable window and adequately maximize the speed of insert statements.
That's all.