Database administrator workshop: October 2012

Wednesday, October 31, 2012

How to restore from a loss of all current control files to the default location

The following scenario simulate a loss of all the control files and the restore process using a backup control file with any Recovery Catalog.

In this situation you are only able to open your database in NOMOUNT mode. Also remember that when you lose all (or one) control files and restore them (or one of them) from a backup control file, you have to perform a recovery of your database and open it with the RESETLOGS option, even if any datafile is restored (like in this scenario).

That's not always true when you're dealing with "created" control file (I hope to simulate that scenario one day), as long as you must specify RESETLOGS if the online logs are lost or NORESETLOGS if the online logs are available.

Anyway a control file restored from a backup has an SCN taken at that "remote" time, different compared with those currently available in the datafiles and redo logs and so they have to be resynchronized.
Generally speaking, having the instance in NOMOUNT mode means your control files are still not read (if available), so RMAN is not able to know how to find information about an "unidentified" database: DBID indeed is contained into the control file.
If you are using a flash recovery area or a recovery catalog (best practice's solution) then you don't have to set the DBID before executing the RESTORE command of your NOMOUNTED instance, saving time and avoiding extra manual steps always prone to error.

Let's start. My instance is running

[oracle@localhost orcl]$ ps -ef|grep smon
oracle   11655     1  0 08:13 ?        00:00:00 ora_smon_orcl
oracle   11811  2820  0 08:20 pts/1    00:00:00 grep smon

Suddenly all my control file are lost.

[oracle@localhost orcl]$ rm /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl /home/oracle/app/oracle/oradata/orcl/control01.ctl

When trying to create a tablespace some errors are thrown:

SQL> create tablespace t1 datafile
'/home/oracle/app/oracle/oradata/orcl/t101.dbf' size 1M;
create tablespace t1 datafile '/home/oracle/app/oracle/oradata/orcl/t101.dbf'
size 1M
*
ERROR at line 1:
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/home/oracle/app/oracle/oradata/orcl/control01.ctl'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3

The instance is crashed

[oracle@localhost orcl]$ ps -ef|grep smon
oracle   11655     1  0 08:13 ?        00:00:00 ora_smon_orcl

As you can verify the mentioned (/home/oracle/app/oracle/oradata/orcl/control01.ctl) file doesn't exist.

[oracle@localhost orcl]$ ll
total 2502160
-rw-rw---- 1 oracle oracle    7348224 Jul 21 08:14 APEX_1930613455248703.dbf
-rw-rw---- 1 oracle oracle   85991424 Jul 21 08:14 example01.dbf
drwxrwxr-x 2 oracle oracle       4096 Jul 21 08:11 non_default_location
-rw-rw---- 1 oracle oracle   52429312 Jul 21 08:23 redo01.log
-rw-rw---- 1 oracle oracle   52429312 Jul 21 08:14 redo02.log
-rw-rw---- 1 oracle oracle   52429312 Jul 21 08:14 redo03.log
-rw-rw---- 1 oracle oracle 1158684672 Jul 21 08:23 sysaux01.dbf
-rw-rw---- 1 oracle oracle  871374848 Jul 21 08:20 system01.dbf
-rw-rw---- 1 oracle oracle   20979712 Jul 21 07:14 temp01.dbf
-rw-rw---- 1 oracle oracle   41951232 Jul 21 08:19 undotbs01.dbf
-rw-rw---- 1 oracle oracle  235937792 Jul 21 08:14 users01.dbf

Let's try to restore the missing control files, starting the instance in NOMOUNT mode:

SQL> startup nomount;
ORACLE instance started.

Total System Global Area  456146944 bytes
Fixed Size                  1344840 bytes
Variable Size             360712888 bytes
Database Buffers           88080384 bytes
Redo Buffers                6008832 bytes

Connect using RMAN and issue the RESTORE CONTROLFILE FROM AUTOBACKUP command. DBIS is not set, but because I'm using the flash recovery area, RMAN is able to find a backup control file.

[oracle@localhost orcl]$ rman target /

RMAN> restore controlfile from autobackup;

Starting restore at 21-07-2012 08:36:22
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=20 device type=DISK

recovery area destination: /home/oracle/app/oracle/flash_recovery_area
database name (or database unique name) used for search: ORCL
channel ORA_DISK_1: AUTOBACKUP
/home/oracle/app/oracle/flash_recovery_area/ORCL/autobackup/2012_07_21/o1_mf_s_789203952_80ogm1c3_.bkp
found in the recovery area
AUTOBACKUP search with format "%F" not attempted because DBID was not set
channel ORA_DISK_1: restoring control file from AUTOBACKUP
/home/oracle/app/oracle/flash_recovery_area/ORCL/autobackup/2012_07_21/o1_mf_s_789203952_80ogm1c3_.bkp
channel ORA_DISK_1: control file restore from AUTOBACKUP complete
output file name=/home/oracle/app/oracle/oradata/orcl/control01.ctl
output file
name=/home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl
Finished restore at 21-07-2012 08:36:25

Let's see if the instance is able to read our restored control files, bringing the database in MOUNT state:

RMAN> alter database mount;

database mounted
released channel: ORA_DISK_1

What does it happen if I try to simply open the database ? It fails with a clear error.

RMAN> alter database open;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of alter db command at 07/21/2012 08:42:50
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open

As said at the beginning of this post when you restore a control file from a backup you have first to recover the database...

RMAN> recover database;

Starting recover at 21-07-2012 08:43:26
Starting implicit crosscheck backup at 21-07-2012 08:43:26
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=20 device type=DISK
Crosschecked 7 objects
Finished implicit crosscheck backup at 21-07-2012 08:43:28

Starting implicit crosscheck copy at 21-07-2012 08:43:28
using channel ORA_DISK_1
Crosschecked 6 objects
Finished implicit crosscheck copy at 21-07-2012 08:43:28

searching for all files in the recovery area
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name:
/home/oracle/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_21/o1_mf_1_3_80ojktc5_.arc
File Name:
/home/oracle/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_21/o1_mf_1_2_80oj4ppv_.arc
File Name:
/home/oracle/app/oracle/flash_recovery_area/ORCL/autobackup/2012_07_21/o1_mf_s_789203952_80ogm1c3_.bkp
File Name:
/home/oracle/app/oracle/flash_recovery_area/ORCL/autobackup/2012_07_17/o1_mf_s_788864449_80c39jlo_.bkp

using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 2 is already on disk as file
/home/oracle/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_21/o1_mf_1_2_80oj4ppv_.arc
archived log for thread 1 with sequence 3 is already on disk as file
/home/oracle/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_21/o1_mf_1_3_80ojktc5_.arc
archived log for thread 1 with sequence 4 is already on disk as file
/home/oracle/app/oracle/oradata/orcl/redo01.log
archived log file
name=/home/oracle/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_21/o1_mf_1_2_80oj4ppv_.arc
thread=1 sequence=2
archived log file
name=/home/oracle/app/oracle/flash_recovery_area/ORCL/archivelog/2012_07_21/o1_mf_1_3_80ojktc5_.arc
thread=1 sequence=3
archived log file name=/home/oracle/app/oracle/oradata/orcl/redo01.log
thread=1 sequence=4
media recovery complete, elapsed time: 00:00:01
Finished recover at 21-07-2012 08:43:31

... and then open it with the RESETLOGS option.

RMAN> alter database open resetlogs;

database opened

The database is now open and control files are available again.

[oracle@localhost orcl]$ ll
total 2511696
-rw-rw---- 1 oracle oracle    7348224 Jul 21 08:44 APEX_1930613455248703.dbf
-rw-rw---- 1 oracle oracle    9748480 Jul 21 08:45 control01.ctl
-rw-rw---- 1 oracle oracle   85991424 Jul 21 08:44 example01.dbf
drwxrwxr-x 2 oracle oracle       4096 Jul 21 08:11 non_default_location
-rw-rw---- 1 oracle oracle   52429312 Jul 21 08:45 redo01.log
-rw-rw---- 1 oracle oracle   52429312 Jul 21 08:44 redo02.log
-rw-rw---- 1 oracle oracle   52429312 Jul 21 08:44 redo03.log
-rw-rw---- 1 oracle oracle 1158684672 Jul 21 08:44 sysaux01.dbf
-rw-rw---- 1 oracle oracle  871374848 Jul 21 08:44 system01.dbf
-rw-rw---- 1 oracle oracle   20979712 Jul 21 07:14 temp01.dbf
-rw-rw---- 1 oracle oracle   41951232 Jul 21 08:44 undotbs01.dbf
-rw-rw---- 1 oracle oracle  235937792 Jul 21 08:44 users01.dbf

That's all

Wednesday, October 17, 2012

How to retrieve the DATABASE IDENTIFIER dumping it from datafiles or online and archived redo logs

I think it's very difficult today to be in a situation where it's required to know your database identifier to recover the database.
But according to an OP (original poster) it seems it could happen: indeed on that Oracle forum's thread it was asked how it could be possible to determine a database identifier (DBID) when:
1) an instance is already crashed and no control files are available
2) no one saved the RMAN output of daily backup
3) you are using a flash recovery area and your autobackup are saved using OMF syntax.

A typical scenario where it is asked to know the database identifier is indeed when:
- you are restoring a control file (and I can image you lose all control files) AND
- you don't have a recovery catalog AND
- you don't have even a flash recovery area configured OR you have it but you didn't specify the %F in the RMAN autobackup option;

So if your instance is down and control files are unavailable you cannot open the database and query the V$DATABASE to know the DBID.
I mean you can't execute query like this:

SQL> select DBID from V$DATABASE;

      DBID
----------
1229390655

Then when you connect to your instance using RMAN it will show you the database identifier...

[oracle@localhost ~]$ rman target /

Recovery Manager: Release 11.2.0.2.0 - Production on Tue Oct 16 06:49:13 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL (DBID=1229390655)

...but as long as you didn't redirect RMAN output to some logs and preserved them in a safe directory or simply email it to you during daily backup, when instance is down and without control files RMAN could not help you anymore:

[oracle@localhost ~]$ rman target /

Recovery Manager: Release 11.2.0.2.0 - Production on Tue Oct 16 06:57:44 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database (not started)

Moreover if you didn't specify the %F format to your autobackup control file option, I mean something like:

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/home/oracle/app/oracle/autobackup_controlfile/%F';

...your database will save it using the OMF format. As you can read on the Oracle documentation at this link
"All files in the fast recovery area are maintained by Oracle Database and associated file names are maintained in the Oracle Managed Files (OMF) format" indeed your autobackup control file won't be useful to deduct your DBID using the desired format c-IIIIIIIIII-YYYYMMDD-QQ (where IIIIIIIIII would be your database dentifier).

So then how can you proceed ? Is it no more possible to know the database identifier ?

My suggestion was to simply use the "ALTER SYSTEM DUMP" command.
As long as you can dump any datafiles, redo logs and even archived redo logs the instance could be in NOMOUNT mode: to obtain the desired DBID you have only to know the exact path of your file.

Have a look at the following samples:

[oracle@localhost ~]$ sqlplus / as sysdba
SQL> startup nomount;

Command to dump the SYSTEM datafile:

SQL> alter system dump datafile '/home/oracle/app/oracle/oradata/orcl/system01.dbf' block min 1 block max 2;
 
System altered.

Under the trace directory I found a new trace file which content is not so clear, but at least our database identifier (Db ID=1229390655) is showed.

[oracle@localhost ~]$ tail -f /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_10680.trc
...
Start dump data block from file /home/oracle/app/oracle/oradata/orcl/system01.dbf minblk 1 maxblk 2
 V10 STYLE FILE HEADER:
        Compatibility Vsn = 186646528=0xb200000
        Db ID=1229390655=0x4947033f, Db Name='ORCL'

...

Command to dump the UNDO datafile:

SQL>  alter system dump datafile '/home/oracle/app/oracle/oradata/orcl/undotbs01.dbf'  block min 1 block max 2;
 
System altered.

Again on the same trace file Oracle appends the dump of the UNDO datafile requested and the same DBID.

[oracle@localhost ~]$ tail -f /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_10680.trc
...
Start dump data block from file /home/oracle/app/oracle/oradata/orcl/undotbs01.dbf minblk 1 maxblk 2
 V10 STYLE FILE HEADER:
        Compatibility Vsn = 186646528=0xb200000
        Db ID=1229390655=0x4947033f, Db Name='ORCL'
...

What about to dump an online redo log ?

SQL> alter system dump logfile '/home/oracle/app/oracle/oradata/orcl/redo01.log' ;
 
System altered.

Always on the same trace file you can find the DBID.

[oracle@localhost ~]$ tail -f /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_10680.trc
...
DUMP OF REDO FROM FILE '/home/oracle/app/oracle/oradata/orcl/redo01.log'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
        Compatibility Vsn = 186646528=0xb200000
        Db ID=1229390655=0x4947033f, Db Name='ORCL'
        Activation ID=1323612153=0x4ee4b7f9
...

Finally even dumping an ARCHIVED redo log...

SQL> alter system dump logfile '/home/oracle/app/oracle/flash_recovery_area/ORCL/archivelog/2012_10_05/o1_mf_1_20_86xpzzvr_.arc';
 
System altered.

... and looking at the trace file, the DBID is showed again.

[oracle@localhost ~]$ tail -f /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_10680.trc
...
DUMP OF REDO FROM FILE '/home/oracle/app/oracle/flash_recovery_area/ORCL/archivelog/2012_10_05/o1_mf_1_20_86xpzzvr_.arc'
 Opcodes *.*
 RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: creation thru eternity
 FILE HEADER:
        Compatibility Vsn = 186646528=0xb200000
        Db ID=1229390655=0x4947033f, Db Name='ORCL'
...

You have no excuse to be not able retrieving a specific database identifier.

That's all.

Thursday, October 11, 2012

How to restore from a loss of a subset of the current control files

The following scenario simulates a loss of a critical component of the database: the control file.
Among other things, it keeps track of names and locations of the datafiles and redo logs, the name of the database, its default block size, the characters set and RMAN recovery information.

Control files should always be multiplexed to different locations and until your database lose just one of the multiplexed control files the recovery process is straightforward.

As you can see in the next lines you can simply:
A) copy one of the valid multiplexed control file to the same location of the lost control file;
B) temporarily update your init.ora file removing the reference of the missing control file (you are experiencing a permanent media failure and need to open your database, but still don't have a valid disk alternative);
C) copy one of the valid multiplexed control file to a different location (you have a valid disk alternative), adding the reference of the copied control file to the init.ora file and removing the inaccessible one;

Let's begin looking at where my control files are located:

[oracle@localhost orcl]$ sqlplus / as sysdba
SQL> set linesize 180
SQL> set pagesize 999
SQL> col name format a70;
SQL> select status, name from V$CONTROLFILE;

STATUS  NAME
------- ---------------------------------------------------
        /home/oracle/app/oracle/oradata/orcl/control01.ctl
        /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl

In my case the database has only two multiplexed control files.
One of them is located in the flash recovery area and I'm going to remove that one under the /home/oracle/app/oracle/oradata/orcl/ directory

[oracle@localhost orcl]$ rm /home/oracle/app/oracle/oradata/orcl/control01.ctl

Then I try to query the V$DATAFILE_HEADER view to display datafile information from the datafile headers: document "Oracle Database Reference 11g Release 2 (11.2)" states two columns are directly related to control file and they are:

Column  Datatype  Description
FILE#  NUMBER          Datafile number (from control file)
STATUS  VARCHAR2(7)  ONLINE | OFFLINE (from control file)
...

You could even query the V$DATAFILE view to obtain the same... "infamous" ORA-27041 error.
In this example my instance didn't crash after the loss of the control file and I was just able to run the query:

SQL> select file#, status, error, name from V$DATAFILE_HEADER;
select file#, status, error, name from V$DATAFILE_HEADER
          *
ERROR at line 1:
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/home/oracle/app/oracle/oradata/orcl/control01.ctl'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3

Let's have a look at the CONTROL_FILES parameter.
This initialization parameter specifies one or more names of control files (up to 8) separated by commas. The database knows its control files are located there: one control file (control01.ctl) is inaccessible, but you still have a valid copy (control02.ctl) into the flash recovery area.

SQL> show parameter control_files;

NAME             TYPE     VALUE
---------------- -------- ------------------------------
control_files    string   /home/oracle/app/oracle/oradata/orcl/control01.ctl,                          /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl

To let your database be available and open again, following the option A) you can copy the good multiplexed control file (control02.ctl) to the location of the missing one (control01.ctl) while your instance is down.

[oracle@localhost orcl]$ cp /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl /home/oracle/app/oracle/oradata/orcl/control01.ctl

At this time you can open again your database, be able to use the instance and successfully execute your queries such as:

SQL> select file#, status, error, name from V$DATAFILE_HEADER;

FILE# STATUS  ERROR NAME
---------- ------- -----------------------------------------------------------
 1 ONLINE /home/oracle/app/oracle/oradata/orcl/system01.dbf
 2 ONLINE /home/oracle/app/oracle/oradata/orcl/sysaux01.dbf
 3 ONLINE /home/oracle/app/oracle/oradata/orcl/undotbs01.dbf
 4 ONLINE /home/oracle/app/oracle/oradata/orcl/users01.dbf
 5 ONLINE /home/oracle/app/oracle/oradata/orcl/example01.dbf
 6 ONLINE /home/oracle/app/oracle/oradata/orcl/APEX_1930613455248703.dbf

When you are experiencing a permanent media failure you can adopt option B) described below.
Remove again one of your control file.

[oracle@localhost orcl]$ rm /home/oracle/app/oracle/oradata/orcl/control01.ctl

In my case attempting to create a new tablespace caused to know my istance is down and crashed.

SQL> create tablespace t1 datafile
'/home/oracle/app/oracle/oradata/orcl/t101.dbf' size 1M;
create tablespace t1 datafile '/home/oracle/app/oracle/oradata/orcl/t101.dbf'
size 1M
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 9996
Session ID: 34 Serial number: 23

No Oracle instance is running.

[oracle@localhost orcl]$ ps -ef|grep smon
oracle   10229  4972  0 07:47 pts/4    00:00:00 grep smon

The instance doesn't start if a control file is inaccessible.

SQL> startup
ORACLE instance started.

Total System Global Area  456146944 bytes
Fixed Size                  1344840 bytes
Variable Size             356518584 bytes
Database Buffers           92274688 bytes
Redo Buffers                6008832 bytes
ORA-00205: error in identifying control file, check alert log for more info

G‌oing through the steps covered by a STARTUP command, after every processes successfully attach to the shared memory, the instance try to read the control file to know the structure of the database: if it doesn't happen due to some media failure the instance won't open and will remain in NOMOUNT mode.

SQL> select OPEN_MODE from V$DATABASE;
select OPEN_MODE from V$DATABASE
       *
ERROR at line 1:
ORA-01507: database not mounted

Use the CONTROL_FILES initialization parameter to know where your control files are located.

SQL> show parameter control_files;

NAME           TYPE     VALUE
-------------- -------- ------------------------------
control_files  string   /home/oracle/app/oracle/oradat a/orcl/control01.ctl,/home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl

Modify the CONTROL_FILES parameter removing the entry of the unavailable control file.

SQL> alter system set control_files='/home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl' scope=spfile;

System altered.

Bounce the instance.

SQL> shutdown immediate;
ORA-01507: database not mounted

ORACLE instance shut down.
SQL> startup;
ORACLE instance started.

Total System Global Area  456146944 bytes
Fixed Size                  1344840 bytes
Variable Size             356518584 bytes
Database Buffers           92274688 bytes
Redo Buffers                6008832 bytes
Database mounted.
Database opened.

The instance is now open to the users and the CONTROL_FILES parameter shows where your multiplexed control files are located.

SQL> show parameter control_files;

NAME           TYPE        VALUE
-------------- ----------- ------------------------------
control_files  string      /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl

I want to underline the above steps were made on a test environment.
I originally had only two multiplexed control files, before removing one: so, for my specific test case, implementing option B) meant to have a database working with only a control file. Don't even think to use only a control file on a production database: Oracle recommends that you multiplex multiple control files (using DBCA three control files are created) on different devices or mirror the file at the operating system level.

Next option to restore from a loss of one control file is C), that is copy one of the valid multiplexed control file to a new and accessible location, replacing the reference of the inaccessible file with the new one in the CONTROL_FILES initialization parameter.

[oracle@localhost orcl]$ cp /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl /home/oracle/app/oracle/oradata/orcl/non_default_location/control01.ctl

Start your instance in NOMOUNT mode.

SQL> startup nomount;
ORACLE instance started.

Total System Global Area  456146944 bytes
Fixed Size                  1344840 bytes
Variable Size             360712888 bytes
Database Buffers           88080384 bytes
Redo Buffers                6008832 bytes

Replace the old reference to the inaccessible file with the new one copied to a different location.

SQL> alter system set control_files='/home/oracle/app/oracle/oradata/orcl/non_default_location/control01.ctl','/home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl' scope=spfile;

System altered.

Bounce the instance.

SQL> shutdown immediate;
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> startup;
ORACLE instance started.

Total System Global Area  456146944 bytes
Fixed Size                  1344840 bytes
Variable Size             360712888 bytes
Database Buffers           88080384 bytes
Redo Buffers                6008832 bytes
Database mounted.
Database opened.

Your database is available and open to the users.

That's all.

Monday, October 8, 2012

How to recover from a loss of the system critical UNDO tablespace on the original location

Here it is another scenario dealing with the loss of a system critical tablespace: today it's UNDO tablespace's turn.

UNDO tablespace stores undo segments, generally used to explicitly (ROLLBACK command) or implicitly (a failed transaction) rollback a transaction, to recreate a read-consistent image and for all kind of recovery purposes.

As well as the SYSTEM tablespace the database must be in the MOUNT state to recover UNDO tablespace.

Let's begin simulating the loss of the UNDO tablespace: in my case the instance is running and...

[oracle@localhost orcl]$ ps -ef|grep smon
oracle    9247     1  0 07:48 ?        00:00:01 ora_smon_orcl

... I simply remove its UNDO datafile.

[oracle@localhost orcl]$ rm /home/oracle/app/oracle/oradata/orcl/undotbs01.dbf

As you can see when I try to insert a row in one table Oracle throws an error stating it's not able to open undotbs01.dbf (data)file.

SQL> insert into  hr.employees (employee_id, last_name, email, hire_date,
job_id)
  2  values (99999, 'pippo', 'pippo@waltdisney.com', sysdate, 'AC_ACCOUNT');
insert into  hr.employees (employee_id, last_name, email, hire_date, job_id)
                *
ERROR at line 1:
ORA-01116: error in opening database file 3
ORA-01110: data file 3: '/home/oracle/app/oracle/oradata/orcl/undotbs01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3

The same information is written in the alert log as well as in a trace file

...
Fri Jul 27 08:09:04 2012
Errors in file
/home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_9533.trc:
ORA-01116: error in opening database file 3
ORA-01110: data file 3: '/home/oracle/app/oracle/oradata/orcl/undotbs01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3
Fri Jul 27 08:09:04 2012
Checker run found 1 new persistent data failures
...

If you try to recover your UNDO tablespace taking it offline you receive some errors: you cannot take offline that tablespace.

RMAN> sql 'alter tablespace UNDOTBS1 offline immediate';

sql statement: alter tablespace UNDOTBS1 offline immediate
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of sql command on default channel at 07/27/2012 08:36:19
RMAN-11003: failure during parse/execution of SQL statement: alter tablespace
UNDOTBS1 offline immediate
ORA-30042: Cannot offline the undo tablespace

Then it's time to shutdown the instance, but it doesn't work.

RMAN> shutdown immediate;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of shutdown command at 07/27/2012 08:37:01
ORA-01116: error in opening database file 3
ORA-01110: data file 3: '/home/oracle/app/oracle/oradata/orcl/undotbs01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3

You can at this time kill your instance or issue a SHUTDOWN ABORT command.

RMAN> shutdown abort;

Oracle instance shut down

Your recovery process needs to be started while the database is in MOUNT mode.

RMAN> startup mount;

connected to target database (not started)
Oracle instance started
database mounted

Total System Global Area     456146944 bytes

Fixed Size                     1344840 bytes
Variable Size                390073016 bytes
Database Buffers              58720256 bytes
Redo Buffers                   6008832 bytes

Once the database is mounted you can restore your UNDO tablespace

RMAN> restore tablespace UNDOTBS1;

Starting restore at 27-07-2012 08:38:54
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=18 device type=DISK

channel ORA_DISK_1: restoring datafile 00003
input datafile copy RECID=20 STAMP=789551613 file
name=/home/oracle/app/oracle/flash_recovery_area/ORCL/datafile/o1_mf_undotbs1_81023bm5_.dbf
destination for restore of datafile 00003:
/home/oracle/app/oracle/oradata/orcl/undotbs01.dbf
channel ORA_DISK_1: copied datafile copy of datafile 00003
output file name=/home/oracle/app/oracle/oradata/orcl/undotbs01.dbf RECID=0
STAMP=0
Finished restore at 27-07-2012 08:38:58

... recover it ...

RMAN> recover tablespace UNDOTBS1;

Starting recover at 27-07-2012 08:39:14
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 27-07-2012 08:39:16

.. and after the recovery process finishes you can finally open your database.

RMAN> alter database open;

database opened

That's all.

Friday, October 5, 2012

How to recover from a loss of the SYSTEM tablespace on different location

This is another post on a scenario describing how to proceed when you lose the SYSTEM tablespace: as already said this tablespace always contains the data dictionary tables for the entire database.
In particular this example will restore the lost tablespace to another location, just as you have to restore it because a disk controller is no more working and you have to recreate it (recovering) to a different location.

Next it's a short summary about the loss of SYSTEM tablespace, copied from a previous post.
When you lose the SYSTEM tablespace the instance could crash or hang: anyway the instance will alert you as soon as possible.
It's not rare the case when you cannot even shutdown the instance and proceed with a kill or a shutdown abort command.
If you have a good backup you can of course restore it, but the database could not be open untill the recover process finishes.
No problem for your committed transactions because your production database is always running in ARCHIVELOG mode and they will be available again as soon as the database opens.
Unlike recovery of non–system tablespaces that can be recovered with the database in the OPEN state, the database must be in the MOUNT state to recover either the SYSTEM or UNDO tablespace.

Let's begin simulating the loss of the SYSTEM tablespace. In my case the instance was running, so I shutted it down and...

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

... then removed the system datafile.

[oracle@localhost ~]$ rm /home/oracle/app/oracle/oradata/orcl/system01.dbf

Let's connect using RMAN

[oracle@localhost ~]$ rman target /

Recovery Manager: Release 11.2.0.2.0 - Production on Fri Jul 27 07:36:55 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database (not started)

The instance was not started and I started it in MOUNT mode to begin the restore/recover process.

RMAN> startup mount;

Oracle instance started
database mounted

Total System Global Area     456146944 bytes

Fixed Size                     1344840 bytes
Variable Size                385878712 bytes
Database Buffers              62914560 bytes
Redo Buffers                   6008832 bytes

Using the report schema command, you can see RMAN is not able to correctly know the size of SYSTEM datafile.

RMAN> report schema;

using target database control file instead of recovery catalog
Report of database schema for database with db_unique_name ORCL

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    0        SYSTEM               *** /home/oracle/app/oracle/oradata/orcl/system01.dbf
2    1105     SYSAUX               *** /home/oracle/app/oracle/oradata/orcl/sysaux01.dbf
3    40       UNDOTBS1             *** /home/oracle/app/oracle/oradata/orcl/undotbs01.dbf
4    225      USERS                *** /home/oracle/app/oracle/oradata/orcl/users01.dbf
5    82       EXAMPLE              *** /home/oracle/app/oracle/oradata/orcl/example01.dbf
6    7        APEX_1930613455248703 *** /home/oracle/app/oracle/oradata/orcl/APEX_1930613455248703.dbf

List of Temporary Files
=======================
File Size(MB) Tablespace           Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1    20       TEMP                 32767 /home/oracle/app/oracle/oradata/orcl/temp01.dbf

Now it's time to restore the datafile to a different location (/home/oracle/app/oracle/oradata/orcl/non_default_location/) compared with the original one (looking at the above report schema output it was: /home/oracle/app/oracle/oradata/orcl/).
To switch a datafile to another location you have to use SET NEWNAME command and include it in a run {...} block.
Moreover it's important to include also before issuing the recover command the SWITCH DATAFILE ALL command.
What does it mean and why we have to execute that command ? From Oracle documentation it "specifies that all data files for which a SET NEWNAME FOR DATAFILE command has been issued in this job are switched to their new name": control file will be so updated with the new location of SYSTEM datafile.

RMAN> run {                
2> set newname for datafile 1 to '/home/oracle/app/oracle/oradata/orcl/non_default_location/system01.dbf';
3> restore tablespace system;
4> switch datafile all;
5> recover tablespace system;
6> alter database open;
7> }

executing command: SET NEWNAME

Starting restore at 27-07-2012 07:40:21
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=21 device type=DISK

channel ORA_DISK_1: restoring datafile 00001
input datafile copy RECID=17 STAMP=789551585 file name=/home/oracle/app/oracle/flash_recovery_area/ORCL/datafile/o1_mf_system_81020mvz_.dbf
destination for restore of datafile 00001: /home/oracle/app/oracle/oradata/orcl/non_default_location/system01.dbf
channel ORA_DISK_1: copied datafile copy of datafile 00001
output file name=/home/oracle/app/oracle/oradata/orcl/non_default_location/system01.dbf RECID=23 STAMP=789723694
Finished restore at 27-07-2012 07:41:37

datafile 1 switched to datafile copy
input datafile copy RECID=24 STAMP=789723698 file name=/home/oracle/app/oracle/oradata/orcl/non_default_location/system01.dbf

Starting recover at 27-07-2012 07:41:39
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:04

Finished recover at 27-07-2012 07:41:48

database opened

After the end of recovery process, you can report the schema info again. The size of SYSTEM tablespace is again well known and the report schema command shows also a new location of the datafile number 1.

RMAN> report schema;

Report of database schema for database with db_unique_name ORCL

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    831      SYSTEM               *** /home/oracle/app/oracle/oradata/orcl/non_default_location/system01.dbf
2    1105     SYSAUX               *** /home/oracle/app/oracle/oradata/orcl/sysaux01.dbf
3    40       UNDOTBS1             *** /home/oracle/app/oracle/oradata/orcl/undotbs01.dbf
4    225      USERS                *** /home/oracle/app/oracle/oradata/orcl/users01.dbf
5    82       EXAMPLE              *** /home/oracle/app/oracle/oradata/orcl/example01.dbf
6    7        APEX_1930613455248703 *** /home/oracle/app/oracle/oradata/orcl/APEX_1930613455248703.dbf

List of Temporary Files
=======================
File Size(MB) Tablespace           Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1    20       TEMP                 32767 /home/oracle/app/oracle/oradata/orcl/temp01.dbf

Looking into the /home/oracle/app/oracle/oradata/orcl/non_default_location directory we can find a new file. It's the datafile of the SYSTEM tablespace restored, recovered and finally moved to this different location.

[oracle@localhost non_default_location]$ ll -h
total 832M
-rw-rw---- 1 oracle oracle 832M Jul 27 07:44 system01.dbf