Enabling Dataguard Broker Observer

I recently had occasion to research the use of the Dataguard Broker observer (Observer). The concept behind the Observer is to have a process running on a separate server that observes the Dataguard Broker on the primary and the standby. When there is a problem with the primary, the broker will initiate the failover. Many of the conditions for failover by the Observer are configurable by the user. In this Blog entry, I describe how to enable the Observer, and what happens with the failover process is activated.

The Dataguard Broker Observer (Observer) is used when fast start failover is required. Fast start failover is simply an automatic failover to the standby database. As noted above, this is not often desired as most often even the most urgent failover/switchover scenarios require configuration of the Application once the database has been activated at the DR site. The observer will initiate the failover under the following conditions automatically:
By default, the observer will initiate failover to the target standby if and only if ALL of the following are true:
• observer is running
• observer and the standby both lose contact with the primary

o Note: if the observer loses contact with the primary, but the standby does not, the observer can determine that the primary is still up via the standby.
• observer is still in contact with the standby
• durability constraints are met
• failover threshold timeout has elapsed
In Oracle 11g, the user can configure failover conditions:
User configurable failover conditions (11g and later)
Oracle Database 11g Rel 1 introduced user configurable failover conditions that can trigger the observer to initiate failover immediately.

Health conditions
Broker can be configured to initiate failover on any of the following conditions. Conditions shown in blue are enabled by default.
• Datafile Offline (due to IO errors)
• Corrupted Controlfile
• Corrupted Dictionary
• Inaccessible Logfile (due to IO errors)
• Stuck Archiver

Oracle errors (ORA-NNNNN)
You can also specify a list of ORA- errors that will initiate FSFO failover. The list is empty by default.

Application initiated
Applications can initiate FSFO failover directly using the DBMS_DG.INITIATE_FS_FAILOVER procedure with an optional message text that will be displayed in the observer log and the primary’s alert log.

Below are the steps to enable the Observer and Fast Start failover.
1. Enable flashback database on both the standby and the primary.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 2137886720 bytes
Fixed Size 2254952 bytes
Variable Size 889194392 bytes
Database Buffers 1241513984 bytes
Redo Buffers 4923392 bytes
Database mounted.
SQL> alter database flashback on;

Database altered.

SQL> alter database open;

Database altered.

SQL> exit

2. Ensure that standby redo logs are created on both the primary and the standby. There must be one more standby redo log group than regular redo log group. For RAC databases, there must be one more redo log group per thread, ie if there are three redo log groups for thread one, there must be four standby redo log groups for thread one. The command to create a standby log are ‘alter database add standby logfile thread x group y (‘+DBDATA’,’+FLASHBACKDATA’) size <bytes>;’ Note that the standby logs must be exactly the same size as the normal redo logs. One trick here is to run a switchover and switch back, on the switchover the Dataguard broker will send a message to the alert log if standby redo logs are needed, it will list the exact commands needed to create the appropriate number and size of standby redo logs.

3. Set the logxptmode to SYNC on both primary and standby. Note that this cannot be done without standby redo logs configured.
DGMGRL> edit database ‘primsdr’ set property ‘LogXptMode’=’SYNC’;
Property “LogXptMode” updated
DGMGRL> edit database ‘prims’ set property ‘LogXptMode’=’SYNC’;
Property “LogXptMode” updated
DGMGRL>

4. If there are two or more standby instances configured, set the property FastStartFailoverTarget to the appropriate standby database. This is allowed if there is a single standby also, so it is a good practice.
DGMGRL> edit database ‘prims’ set property FastStartFailoverTarget=’primsdr’;
Property “faststartfailovertarget” updated
DGMGRL>

5. Set the availability mode to maximum availability:
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
Succeeded.
DGMGRL>

6. Start the observer in detached session. This can be done through cron or other method. Typically the observer is on a separate server, and is started on bootup:
DGMGRL> CONNECT sys@prims;
Password: password
Connected.
DGMGRL> START OBSERVER;
Observer started

7. Enable FAST_START failover and verify the status:
DGMGRL> enable FAST_START failover;
Enabled.
DGMGRL> show FAST_START failover;

Fast-Start Failover: ENABLED

Threshold: 30 seconds
Target: primsdr
Observer: gridserver
Lag Limit: 30 seconds (not in use)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE

Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Offline YES

Oracle Error Conditions:
(none)

DGMGRL>

8. Test the failover scenarios. When the primary aborts due to a failover condition, the observer will attempt to reinstate the old primary as the standby when the standby is mounted and the observer senses the mount. At that point, the observer will use flashback database to flash the database back to the failure point, and instantiate it as the new standby database in preparation for a probably switchover operation.
a. For testing. Failover to standby. This forces the database open resetlogs, disabling standby:
[oracle@gridserver admin]$ dgmgrl sys/password
DGMGRL> failover to primsdr
Performing failover NOW, please wait…
Failover succeeded, new primary is “primsdr”
DGMGRL> exit
[oracle@gridserver admin]$ . oraenv
ORACLE_SID = [primsdr] ?
-bash: [: too many arguments
The Oracle base has been set to
XPointer evaluation failed: no locset
[oracle@gridserver admin]$ dgmgrl sys/password
DGMGRL for Linux: Version 11.2.0.4.0 – 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type “help” for information.
Connected.
DGMGRL> show configuration

Configuration – prims

Protection Mode: MaxAvailability
Databases:
primsdr – Primary database
Warning: ORA-16817: unsynchronized fast-start failover configuration

prims – (*) Physical standby database (disabled)
ORA-16661: the standby database needs to be reinstated

Fast-Start Failover: ENABLED

Configuration Status:
WARNING

DGMGRL> exit

b. Startup and mount the standby, this will initiate the reinstatement:
[oracle@gridserver ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Thu Apr 24 16:29:28 2014

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area 2137886720 bytes
Fixed Size 2254952 bytes
Variable Size 889194392 bytes
Database Buffers 1241513984 bytes
Redo Buffers 4923392 bytes
Database mounted.
SQL> show parameter background

c. Monitor the observer log to verify the reinstantiation:
[oracle@gridserver ~]$ dgmgrl sys/password
DGMGRL for Linux: Version 11.2.0.4.0 – 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type “help” for information.
Connected.
DGMGRL> start observer;
Observer started

16:30:15.97 Thursday, April 24, 2014
Initiating reinstatement for database “prims”…
Reinstating database “prims”, please wait…
Operation requires shutdown of instance “prims” on database “prims”
Shutting down instance “prims”…
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance “prims” on database “prims”
Starting instance “prims”…
ORACLE instance started.
Database mounted.
Continuing to reinstate database “prims” …
Reinstatement of database “prims” succeeded
16:31:44.75 Thursday, April 24, 2014

d. Verify the reinstantiation and switchover:
[oracle@gridserver ~]$ dgmgrl sys/password
DGMGRL for Linux: Version 11.2.0.4.0 – 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type “help” for information.
Connected.
DGMGRL> show configuration verbose

Configuration – prims

Protection Mode: MaxAvailability
Databases:
primsdr – Primary database
prims – (*) Physical standby database

(*) Fast-Start Failover target

Properties:
FastStartFailoverThreshold = ’30’
OperationTimeout = ’30’
FastStartFailoverLagLimit = ’30’
CommunicationTimeout = ‘180’
ObserverReconnect = ‘0’
FastStartFailoverAutoReinstate = ‘TRUE’
FastStartFailoverPmyShutdown = ‘TRUE’
BystandersFollowRoleChange = ‘ALL’
ObserverOverride = ‘FALSE’
ExternalDestination1 = ”
ExternalDestination2 = ”
PrimaryLostWriteAction = ‘CONTINUE’

Fast-Start Failover: ENABLED

Threshold: 30 seconds
Target: prims
Observer: gridserver
Lag Limit: 30 seconds (not in use)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE

Configuration Status:
SUCCESS

DGMGRL> switchover to prims
Performing switchover NOW, please wait…
Operation requires a connection to instance “prims” on database “prims”
Connecting to instance “prims”…
Connected.
New primary database “prims” is opening…
Operation requires startup of instance “primsdr” on database “primsdr”
Starting instance “primsdr”…
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is “prims”
DGMGRL>

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: