Pages

Saturday, September 8, 2012

How to recover from a loss of all redo log members of an INACTIVE group

Let's simulate a loss of all redo log members of an inactive group.
If you have completely lost one redo log group your database won't be able to open.
Next thing is to know the STATUS of the lost redo log group (we know it because we are simulating the loss, but...) and, just because this information is on V$LOG view, try to get also the ARCHIVED column.
Now if the redo log group lost was INACTIVE and archived (YES value) we can simply recreate all members with the "alter database clear logfile" command;
if it was INACTIVE and not archived (NO value) we'll execute the "alter database clear unarchived logfile" command.

Let's look at the example. Here is my redo log group state
SQL> select member, a.group#, a.status, b.status, a.archived  from v$log a, v$logfile b where  a.group# = b.group# order by a.group#, member;

MEMBER                                                           GROUP# STATUS           STATUS  ARC
------------------------------------------------------------ ---------- ---------------- ------- ---
/home/oracle/app/oracle/oradata/orcl/redo01.log                       1 INACTIVE                 YES
/home/oracle/app/oracle/oradata/orcl/redo01b.log                      1 INACTIVE                 YES
/home/oracle/app/oracle/oradata/orcl/redo02.log                       2 CURRENT                  NO
/home/oracle/app/oracle/oradata/orcl/redo02b.log                      2 CURRENT                  NO
/home/oracle/app/oracle/oradata/orcl/redo03.log                       3 INACTIVE                 YES
/home/oracle/app/oracle/oradata/orcl/redo03b.log                      3 INACTIVE                 YES
Delete all members of redo log group 1 and kill your instance.
[oracle@localhost ~]$ rm /home/oracle/app/oracle/oradata/orcl/redo01*
[oracle@localhost ~]$ ps -ef |grep smon
oracle    8176     1  0 06:44 ?        00:00:03 ora_smon_orcl
oracle    9057  3164  0 08:10 pts/5    00:00:00 grep smon
[oracle@localhost ~]$ kill -9 8176
Try to open your database ...
SQL> startup
ORACLE instance started.

Total System Global Area  456146944 bytes
Fixed Size                  1344840 bytes
Variable Size             385878712 bytes
Database Buffers           62914560 bytes
Redo Buffers                6008832 bytes
Database mounted.
ORA-03113: end-of-file on communication channel
Process ID: 9184
Session ID: 17 Serial number: 7
... and look at your alert log
...
ORA-00313: open failed for members of log group 1 of thread 
ORA-00312: online log 1 thread 1: '/home/oracle/app/oracle/oradata/orcl/redo01.log'
ORA-00312: online log 1 thread 1: '/home/oracle/app/oracle/oradata/orcl/redo01b.log'
USER (ospid: 9184): terminating the instance due to error 313
...
Start your database in MOUNT mode
SQL> startup mount
ORACLE instance started.

Total System Global Area  456146944 bytes
Fixed Size                  1344840 bytes
Variable Size             385878712 bytes
Database Buffers           62914560 bytes
Redo Buffers                6008832 bytes
Database mounted.
Query the V$LOG view
SQL> select member, a.group#, a.status, b.status, a.archived  from v$log a, v$logfile b where  a.group# = b.group# order by a.group#, member;

MEMBER                                                           GROUP# STATUS           STATUS  ARC
------------------------------------------------------------ ---------- ---------------- ------- ---
/home/oracle/app/oracle/oradata/orcl/redo01.log                       1 INACTIVE                 YES
/home/oracle/app/oracle/oradata/orcl/redo01b.log                      1 INACTIVE                 YES
/home/oracle/app/oracle/oradata/orcl/redo02.log                       2 CURRENT                  NO
/home/oracle/app/oracle/oradata/orcl/redo02b.log                      2 CURRENT                  NO
/home/oracle/app/oracle/oradata/orcl/redo03.log                       3 INACTIVE                 YES
/home/oracle/app/oracle/oradata/orcl/redo03b.log                      3 INACTIVE                 YES
The redo log group 1 is INACTIVE and ARCHIVED so simply issue the following command:
SQL> alter database clear logfile group 1;

Database altered.
Open your database
SQL> alter database open;

Database altered.
If you are in the situation to have lost an INACTIVE and NOT already ARCHIVED redo log group then your command should be the following:
SQL> alter database clear unarchived logfile group 1;

Database altered.
Next step is to perform a complete backup
RMAN> backup database;
That's all.

How to recover from a loss of one redo log member of a multiplexed group

This scenario describes what happens when you lose one redo log member of a multiplexed group.
It could happen for a simply media failure or because some one "needed" to free some space and saw that .log file!!!
It happens to me years ago and that was told to me by a system administrator.

Anyway when you have multiplexed your redo log group using several members as Oracle best practices suggest and your disks begin to experience problems your database will continue to function since it's still able to write to one redo log member.
If your disk is completely out you can opt to some options whether or not you have a new disk!
With a new disk you can drop and create again the "lost" redo log member to the original location, otherwise you have to perform the same two steps (drop and create) choosing a different location.
Even if you have already lost your redo log member Oracle will arise the ORA-01609 error when you try to drop a member of a CURRENT group. Querying the status column of V$LOG view you can have several results and generally they are:
CURRENT when that log file is "currently" written by the log writer process (ps -ef|grep lgwr),
ACTIVE when that log file is required in case of a crash recovery (even if it's already archived)
INACTIVE when that log file is not required in case of a crash recovery.

So when you try to drop your "failed" redo log member and you receive the ORA-01609 error just force the database to switch to the next redo log group. 

Let's look it with an example. Here is my redo log configuration:
SQL> select member, a.group#, a.status, b.status from v$log a, v$logfile b where  a.group# = b.group#
  2  order by a.group#, member;

MEMBER                                                           GROUP# STATUS           STATUS
------------------------------------------------------------ ---------- ---------------- -------
/home/oracle/app/oracle/oradata/orcl/redo01.log                       1 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo01b.log                      1 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo02.log                       2 CURRENT
/home/oracle/app/oracle/oradata/orcl/redo02b.log                      2 CURRENT
/home/oracle/app/oracle/oradata/orcl/redo03.log                       3 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo03b.log                      3 INACTIVE
As you can see I have three redo log groups formed by two multiplexed members and currently the redo log group number 2 is the CURRENT one. Now I simulate the lost of the redo log member named 'redo02b.log'.
[oracle@localhost ~]$ cd /home/oracle/app/oracle/oradata/orcl/
[oracle@localhost orcl]$ rm redo02b.log
[oracle@localhost orcl]$ ll redo*
-rw-rw---- 1 oracle oracle 52429312 Sep  5 12:58 redo01b.log
-rw-rw---- 1 oracle oracle 52429312 Sep  5 12:58 redo01.log
-rw-rw---- 1 oracle oracle 52429312 Sep  5 22:30 redo02.log
-rw-rw---- 1 oracle oracle 52429312 Sep  5 12:55 redo03b.log
-rw-rw---- 1 oracle oracle 52429312 Sep  5 12:55 redo03.log
As you cann see from the alert log appears the error
...
Errors in file /home/oracle/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_lgwr_8172.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/home/oracle/app/oracle/oradata/orcl/redo02b.log'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
...
Now have a look at how the status column of V$LOGFILE view change (from NULL to INVALID) for the lost member
SQL> select member, a.group#, a.status, b.status from v$log a, v$logfile b where  a.group# = b.group#
  2  order by a.group#, member;

MEMBER                                                           GROUP# STATUS           STATUS
------------------------------------------------------------ ---------- ---------------- -------
/home/oracle/app/oracle/oradata/orcl/redo01.log                       1 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo01b.log                      1 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo02.log                       2 CURRENT
/home/oracle/app/oracle/oradata/orcl/redo02b.log                      2 CURRENT          INVALID
/home/oracle/app/oracle/oradata/orcl/redo03.log                       3 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo03b.log                      3 INACTIVE
I've lost a member of the CURRENT redo log group so I cannot drop it
SQL> alter database drop logfile member '/home/oracle/app/oracle/oradata/orcl/redo02b.log';
alter database drop logfile member '/home/oracle/app/oracle/oradata/orcl/redo02b.log'
*
ERROR at line 1:
ORA-01609: log 2 is the current log for thread 1 - cannot drop members
ORA-00312: online log 2 thread 1: '/home/oracle/app/oracle/oradata/orcl/redo02.log'
ORA-00312: online log 2 thread 1: '/home/oracle/app/oracle/oradata/orcl/redo02b.log'
I have to force a redo log switch:
SQL> alter system switch logfile;

System altered.
Now my redo log group state is as the following:
SQL> select member, a.group#, a.status, b.status from v$log a, v$logfile b where  a.group# = b.group#
  2  order by a.group#, member;

MEMBER                                                           GROUP# STATUS           STATUS
------------------------------------------------------------ ---------- ---------------- -------
/home/oracle/app/oracle/oradata/orcl/redo01.log                       1 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo01b.log                      1 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo02.log                       2 ACTIVE
/home/oracle/app/oracle/oradata/orcl/redo02b.log                      2 ACTIVE           INVALID
/home/oracle/app/oracle/oradata/orcl/redo03.log                       3 CURRENT
/home/oracle/app/oracle/oradata/orcl/redo03b.log                      3 CURRENT
And I can successfully drop and recreate my lost redo log member:
SQL> alter database drop logfile member '/home/oracle/app/oracle/oradata/orcl/redo02b.log';

Database altered.

SQL> select member, a.group#, a.status, b.status from v$log a, v$logfile b where  a.group# = b.group#
  2  order by a.group#, member;

MEMBER                                                           GROUP# STATUS           STATUS
------------------------------------------------------------ ---------- ---------------- -------
/home/oracle/app/oracle/oradata/orcl/redo01.log                       1 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo01b.log                      1 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo02.log                       2 ACTIVE
/home/oracle/app/oracle/oradata/orcl/redo03.log                       3 CURRENT
/home/oracle/app/oracle/oradata/orcl/redo03b.log                      3 CURRENT

SQL> alter database add logfile member  '/home/oracle/app/oracle/oradata/orcl/redo02b.log' to group 2;

Database altered.

SQL> select member, a.group#, a.status, b.status from v$log a, v$logfile b where  a.group# = b.group#
  2  order by a.group#, member;

MEMBER                                                           GROUP# STATUS           STATUS
------------------------------------------------------------ ---------- ---------------- -------
/home/oracle/app/oracle/oradata/orcl/redo01.log                       1 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo01b.log                      1 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo02.log                       2 ACTIVE
/home/oracle/app/oracle/oradata/orcl/redo02b.log                      2 ACTIVE           INVALID
/home/oracle/app/oracle/oradata/orcl/redo03.log                       3 CURRENT
/home/oracle/app/oracle/oradata/orcl/redo03b.log                      3 CURRENT

What's happening ? My new redo log member is still INVALID ? What does it mean ?
If you look at the Oracle documentation INVALID could mean: inaccessible (as it was before) or just created (as I did few lines above).
Let's look what it happens when I force logfiles to switch:

SQL> alter system switch logfile;

System altered.

SQL> select member, a.group#, a.status, b.status from v$log a, v$logfile b where  a.group# = b.group#
  2  order by a.group#, member;

MEMBER                                                           GROUP# STATUS           STATUS
------------------------------------------------------------ ---------- ---------------- -------
/home/oracle/app/oracle/oradata/orcl/redo01.log                       1 CURRENT
/home/oracle/app/oracle/oradata/orcl/redo01b.log                      1 CURRENT
/home/oracle/app/oracle/oradata/orcl/redo02.log                       2 INACTIVE
/home/oracle/app/oracle/oradata/orcl/redo02b.log                      2 INACTIVE         INVALID
/home/oracle/app/oracle/oradata/orcl/redo03.log                       3 ACTIVE
/home/oracle/app/oracle/oradata/orcl/redo03b.log                      3 ACTIVE

6 rows selected.

SQL> alter system switch logfile;

System altered.

SQL> select member, a.group#, a.status, b.status from v$log a, v$logfile b where  a.group# = b.group#
  2  order by a.group#, member;

MEMBER                                                           GROUP# STATUS           STATUS
------------------------------------------------------------ ---------- ---------------- -------
/home/oracle/app/oracle/oradata/orcl/redo01.log                       1 ACTIVE
/home/oracle/app/oracle/oradata/orcl/redo01b.log                      1 ACTIVE
/home/oracle/app/oracle/oradata/orcl/redo02.log                       2 CURRENT
/home/oracle/app/oracle/oradata/orcl/redo02b.log                      2 CURRENT
/home/oracle/app/oracle/oradata/orcl/redo03.log                       3 ACTIVE
/home/oracle/app/oracle/oradata/orcl/redo03b.log                      3 ACTIVE

6 rows selected.
As you can see the database marked the 'redo02b.log' as NULL and this means available to be used. That's all.