Pages

Thursday, September 27, 2012

How to recover from a loss of the SYSTEM tablespace on the original location

The following scenario will describe how to proceed when you lose a system critical tablespace.
I'll simulate the loss of SYSTEM tablespace, I mean datafile 1... which always contains the data dictionary tables for the entire database.
In particular this example will restore the lost tablespace to its original location.
When you lose the SYSTEM tablespace the instance could crash or hang: anyway the instance will alert you as soon as possible.

It's not rare the case when you cannot even shutdown the instance and proceed with a kill or a shutdown abort command.

If you have a good backup you can of course restore it, but the database could not be open untill the recovery process finish.
No problem for your committed transactions because your production database is always running in ARCHIVELOG mode and they will be available as soon as the database open.

Let's begin simulating the loss of SYSTEM tablespace.
In my case the instance was not running as you can see when I deleted the file.
[oracle@localhost ~]$ ps -ef|grep smon
oracle    7867     1  0 06:17 ?        00:00:01 ora_smon_orcl
Here are my sys* datafiles
[oracle@localhost ~]$ ll -h /home/oracle/app/oracle/oradata/orcl/sys*
-rw-rw---- 1 oracle oracle 1.1G Jul 27 07:11 /home/oracle/app/oracle/oradata/orcl/sysaux01.dbf
-rw-rw---- 1 oracle oracle 832M Jul 27 07:11 /home/oracle/app/oracle/oradata/orcl/system01.dbf
I remove just the datafile of the SYSTEM tablespace.
[oracle@localhost ~]$ rm /home/oracle/app/oracle/oradata/orcl/system01.dbf
Let's try to start up the instance.
[oracle@localhost ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Fri Jul 27 07:12:37 2012

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

ERROR:
ORA-01075: you are currently logged on

Enter user-name: sys
Enter password: 
ERROR:
ORA-00604: error occurred at recursive SQL level 2
ORA-01116: error in opening database file 1
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3
ORA-00604: error occurred at recursive SQL level 1
ORA-01116: error in opening database file 1
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3
ORA-00604: error occurred at recursive SQL level 2
ORA-01116: error in opening database file 1
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3
I'm not able to login and a clear error message is showed on screen. Let's use then RMAN...
[oracle@localhost ~]$ rman target /

Recovery Manager: Release 11.2.0.2.0 - Production on Fri Jul 27 07:14:41 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04005: error from target database: 
ORA-01116: error in opening database file 1
ORA-01110: data file 1: '/home/oracle/app/oracle/oradata/orcl/system01.dbf'
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3
Of course even RMAN shows you the same error.
The instance tried to start, shared memory were successfully attached by the Oracle processes, but the instance doesn't work as expected: the SYSTEM tablespace is missing for RMAN too... I need to kill the instance.
[oracle@localhost ~]$ ps -ef|grep smon
oracle    7867     1  0 06:17 ?        00:00:01 ora_smon_orcl
oracle    8446  2875  0 07:16 pts/1    00:00:00 grep smon
[oracle@localhost ~]$ kill -9 7867
[oracle@localhost ~]$ ps -ef|grep orcl
Let's use RMAN again...
[oracle@localhost ~]$ rman target /

Recovery Manager: Release 11.2.0.2.0 - Production on Fri Jul 27 07:17:33 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database (not started)
... so we can start up the database in MOUNT mode
RMAN> startup mount;

Oracle instance started
database mounted

Total System Global Area     456146944 bytes

Fixed Size                     1344840 bytes
Variable Size                381684408 bytes
Database Buffers              67108864 bytes
Redo Buffers                   6008832 bytes
Now we can issue the restore command for the SYSTEM tablespace
RMAN> restore tablespace system;

Starting restore at 27-07-2012 07:18:28
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=19 device type=DISK

channel ORA_DISK_1: restoring datafile 00001
input datafile copy RECID=17 STAMP=789551585 file name=/home/oracle/app/oracle/flash_recovery_area/ORCL/datafile/o1_mf_system_81020mvz_.dbf
destination for restore of datafile 00001: /home/oracle/app/oracle/oradata/orcl/system01.dbf
channel ORA_DISK_1: copied datafile copy of datafile 00001
output file name=/home/oracle/app/oracle/oradata/orcl/system01.dbf RECID=0 STAMP=0
Finished restore at 27-07-2012 07:20:15
The previous restore command try to restore the datafile of SYSTEM tablespace to its original location. Then it's time to issue the recover command ...
RMAN> recover tablespace system;

Starting recover at 27-07-2012 07:20:41
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:04

Finished recover at 27-07-2012 07:20:46
... and finally open the database.
RMAN> alter database open;

database opened
Now the database is available again to all the users and the SYSTEM tablespace is fully recovered.

That's all.