Incremental Recovery of Standby (ASM and RMAN)

 I had another interesting recovery operation recently.  A company I was working with had a very large (~4TB) database generating anywhere from several hundred MB to over a TB of archive logs each day.  They had RMAN backups configured, but due to a slight misconfiguration of their backups, they accidentally deleted an archive log that they needed.  As consequence, they had a gap in their standby, and they did not have a backup of the required archive log.  They had also created additional data files in the time that the standby was out of synch.

In the past, I had always recovered this situation by a complete re-instantiation of the standby, which we really did not want to do because of the very large size of the source database.  Therefore, I was happy to have someone mention this process of recovery using an incremental backup from SCN.  This is the process I used.  After recovering the customer, I went back and refined the process and identified some unneeded steps.

First I created a primary and a standby database in VMware workstation.  The primary was called dgsrc, the standby is dgsrcsb.  I then created a table I called mytest_table, and inserted some rows in dgsrc.  I then opened the standby in read only and verified that the table was in the standby.  At that point I shut down the standby and its listener.

Then I inserted a large number of rows in the primary, forcing a log switch after each large insert.  I then added a datafile to the primary.

Next, I started rman and deleted several archive logs so they could not be transported.  I then started up the standby and verified that there was a gap:

FAL[client]: Failed to request gap sequence
 GAP – thread 1 sequence 85-86
 DBID 586313788 branch 769276415
FAL[client]: All defined FAL servers have been attempted.
————————————————————
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that’s sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.

After verifying the gap, I added a datafile to the primary, and inserted more rows into mytest_table.  I then took a count of the rows in the table:

SQL> select count(1) from akerber.mytest_table;

  COUNT(1)
———-
   3809968

I then proceeded to attempt the recovery process as documented below. The basic reference document for this process is Metalink id #836986.1.  However, this document does not cover the steps to add missing datafiles, and also contains unnecessary steps.  The steps below were validated on both a database using a file system, and a database using ASM.  Also, in order for the process below to work properly, the parameters DB_FILE_NAME_CONVERT and LOG_FILE_NAME_CONVERT must be set correctly on the standby.

The basic steps are as follows:

  1.  Shutdown the standby database.
  2. Identify the last applied SCN on the standby.
  3. Make a list of files currently on the primary and the standby.
  4. Run an incremental backup from SCN, and create a standby controlfile, both on the primary database.
  5. Move the backup files and the backup controlfile to the standby server.
  6. Restore the standby controlfile to the standby.
  7. Create missing datafiles.
  8. Switch datafiles to copy (required whether or not new files are created).
  9. Recover using the incremental SCN backup.
  10. Restart the standby.

The steps in detail are below.  This is a much quicker method to re-instantiate the standby database than I have used in the past.  I have tried it out on databases running on both file systems and ASM.

Step 1:   Once you have determined you have a log gap, and do not have a copy of the archivelog to register, cancel the recovery process on the standby:

 

‘alter database recover managed standby database cancel;’

 

Step 2:  On the STANDBY, run these commands (NOTE: the CURRENT_SCN can be a very large number, and  if you do not use the format command,  the output may be an unreadable exponential form):

 

Column current_scn format ‘999999999999999’;

Select current_scn from v$database;

select min(fhscn) current_scn from x$kcvfh;

 

Use the lesser of the two numbers above, and then subtract 1000 to for the SCN in the incremental backup operation.  Note that the lesser of the two numbers above should be sufficient, but choosing an earlier SCN cannot hurt, and will add a safety margin.

 

Step 3:  Check the number and creation date of datafiles on the STANDBY:

 

select count(1), max(file#), to_char(max(creation_time),’dd-Mon-yyyy hh24mi’) from v$datafile;.

 

Run the same query on the primary database, and compare the two.

 

 If the number of files are different on the two queries, or the creation dates are substantially different, save a listing of datafiles from both the standby and the primary. Use the query below to get the listing.  Run it on both the primary and standby:

 

select file#, name from v$datafile;

 

Step 4:  On the PRIMARY start RMAN, and run the backup command: 

run {

allocate channel t1 device type disk;

allocate channel t2 device type disk;

allocate channel t3 device type disk;

allocate channel t4 device type disk;

BACKUP as compressed backupset INCREMENTAL FROM SCN <scn from step 1 goes here>

DATABASE FORMAT ‘<path to backup destination>/filename_%U’;

}

Eg:

RMAN> run {

allocate channel t1 device type disk;

allocate channel t2 device type disk;

allocate channel t3 device type disk;

allocate channel t4 device type disk;

BACKUP as compressed backupset INCREMENTAL FROM SCN 604900565

DATABASE FORMAT ‘/home/oracle/bkup/orc_%U.bak’;

}

 

Backup the current control file on the primary for the standby.  Note that this command can also be included as part of the backup script above.  If you choose to do this, make sure it is run after the incremental backup: 

 

RMAN> backup current controlfile for standby format ‘/home/oracle/bkup/control01.ctl’;

 

Step 5:  Copy the standby controlfile backup and the incremental backup files to the standby server, using SCP or FTP.    It saves time to store the Incremental backup and controlfile backup in a location by the same name on the standby server.

 

Step 6:  Shutdown the STANDBY, and start it up in nomount mode, and restore the standby controlfile.  Remember to use the STANDBY keyword so that Oracle understands that this is a standby controlfile that is being restored:

RMAN> Shutdown immediate;

RMAN> startup nomount;

RMAN> restore standby controlfile from ‘/home/oracle/bkup/control01.ctl’;

RMAN> alter database mount;

 

Catalog the backup files on the standby.  This is not necessary if the files are in the same location as on the primary:

RMAN> catalog start with ‘directory containing files’; E.g.; RMAN> catalog start with ‘/home/oracle/bkup’;

 

RMAN will ask you if you really want to catalog the files, enter YES to catalog the files.

 

Now switch to SQLPLUS and compare the list of the files from v$datafile with the list you saved earlier from the standby: 

      SQL> select file#, name from v$datafile;

 

Step 7:  If any datafiles are missing, create them using these commands:

SQL> alter system set standby_file_management=manual scope=memory;

SQL> alter database create datafile <file#>;

Eg: SQL> alter database create datafile 12;

 

After all datafiles are created, run this command:

SQL> alter system set standby_file_management=auto scope=memory;

 

Step 8:  Switch back to RMAN and switch to the new data files (note this step is required whether or not new files are created.  It actually directs the control file to the correct set of datafiles):

 

RMAN> switch database to copy;

 

The ‘switch database to copy’ command will force a clear of the redo logs.   If you are monitoring the alert log you will see a series of errors about failed to open online log.  These can be ignored.

 

NOTE:  I did not test this aspect, however I am fairly confident that ff the DB_FILE_NAME_CONVERT  and LOG_FILE_NAME_CONVERT parameters are not set, the CATALOG command must be run for each diskgroup (or directory) on the STANDBY containing datafiles prior to the ‘switch database to copy’ command.

eg:

 

RMAN> catalog start with ‘+DATA1/orcl/datafile’;

RMAN> catalog start with ‘+DATA2/orcl/datafile’;

 

Once again, after each catalog command you will be asked if you want to catalog the files, enter YES.  After all catalog commands are complete, run the command:

RMAN> switch database to copy;

 

Step 9:  Recover the standby database.  This is done with the command below:

                      RMAN> recover database noredo;

The NOREDO option forces RMAN to use a backup rather than the archive logs, thus bypassing the gaps.  This apply process will often take substantial time. 

 

Step 10: After the recovery process is complete,  resume the recovery process:

 

                 SQL> alter database recover managed standby database disconnect   from session;

 

Monitor the alert log on the standby to verify that logs are applying.   Entries in the alert log beginning with ‘Media Recovery Log <file name>’ indicate that logs are being applied.

‘Archived Log entry <number> added for thread <#> sequence <#> ‘ indicates that logs are being received.   A sample from a running standby  is below:

 

RFS[1]: Selected log 7 for thread 1 sequence 102 dbid 586313788 branch 769276415

Sun Dec 18 13:25:31 2011

Archived Log entry 35 added for thread 1 sequence 101 ID 0x22f2f73c dest 1:

Sun Dec 18 13:25:33 2011

Media Recovery Log +DATA1X/dgsrc1/archivelog/2011_12_18/thread_1_seq_101.474.770217929

Media Recovery Waiting for thread 1 sequence 102 (in transit)

The last line indicates that the standby has applied the previous log and is waiting for the next log (102) to arrive so that it can apply that log.

 

To verify that the process was successful, I opened the standby in read only mode, and counted the rows in mytest_table.

 

SQL> select count(1) from akerber.mytest_table;

  COUNT(1)
———-
   3809968

The recovery process was successful.

3 Responses to “Incremental Recovery of Standby (ASM and RMAN)”

  1. Sasi Says:

    Great Post and thanks for sharing. I’m going to practice this in my test environment.

    Thanks,
    Sasi

  2. adhikarexuss Says:

    Hi Andrew,

    I was just wondering, if you made a mistake while taking the incremental backup, for example it is supposed to be from SCN 1000 but you took it from SCN 999, can the backup still be usable?

    Thank you in advance,
    Adhika

    • dbakerber Says:

      Yes, if you follow these instructions, and the required SCN is within the range of the backup, then rman will be able to find and apply the appropriate changes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: