Trace File Cleanup – A common Problem for DBA’s

March 24, 2014

Recently I have had to do work at a couple of client sites to set up scripts to clean up the many trace and log files that the Oracle database leaves all over the file system. Somewhere in the middle of the process I realized that the whole process would be much easier if I were to read the Oracle database itself to determine the location of the audit, trace, and log files.

So after scanning my memory, I realized that there are probably only two or three parameters that I need to look at to clean up the vast majority of trace files. These include audit_file_dest, background_dump_dest, and in 11gR2 only, diagnostic_dest. In addition, you will need the value for ORACLE_HOME.

Also, since you may have multiple Oracle Homes on a server, you want to make sure that you clean up files for all running instances on the server. So, the first thing I did was set up a loop to identify all the running instances on the server. In the bash shell, a loop to do this can be set up like this:

WORKDIR=”$1″
export WORKDIR
ORAENV_ASK=’NO’
for ORACLE_SID in `ps -ef | grep smon|grep -v grep|awk ‘{print $8}’| awk -F “_” ‘{print$3}’`
do
. oraenv
sqlplus / as sysdba <<EOF
select instance_name from v$instance;
exit;
EOF
<Rest of commands Here>
done

As you can see, we now have a loop set up that will roll us through each running Oracle instance on the server. If you want to exclude ASM, add a grep –v ASM.

From there, we need to extract the location of the trace files and audit file. In order to do this, you write some very short shell scripts, or you can embed the code in your script. The Short shell scripts are below:

#get_audit.ksh
#!/bin/sh
sqlplus -s “/ as sysdba”<<EOF
set feedback off heading off verify off
select value from v\$parameter where name=’audit_file_dest’;
exit
EOF

#get_diag_base.sh
#!/bin/sh
$ORACLE_HOME/bin/sqlplus -s “/ as sysdba”<<EOF
set feedback off heading off verify off
select value from v\$parameter where name=’diagnostic_dest’;
exit
EOF

#get_dump.sh
#!/bin/sh
sqlplus -s “/ as sysdba”<<EOF
set feedback off heading off verify off
select value from v\$parameter where name=’background_dump_dest’;
exit
EOF

The output for these shell scripts can be called from the calling script in this format:

AUDIT_DEST=`$WORKDIR/get_audit.sh`

In Oracle 9i, the audit file destination usually has a question mark in it (indicating Oracle home). This can be replaced using sed, like this:

finaud=`echo $AUDIT_DEST | sed -e ‘s|\?|’”$ORACLE_HOME”‘|g’`

Next, we need to figure out whether we have an Oracle 11gR2 type structure (uses diagnostic_dest) or pre-11g structure (uses background_dump_dest). To determine this, we check for the existence of the parameter diagnostic_dest, so after calling get_diag_base, we check for existence, then we use sed once again to clean up the values so that we can use them in a script:

if [[ -d "$DIAG" ]]; then
DIAG_BASE=$DIAG/rdbms/$lsid/$ORACLE_SID
DIAG_BASE=`echo $DUMPNAME | sed -e ‘s|\/trace||g’`
version=11
else
DIAG_BASE=`echo $DUMPNAME | sed -e ‘s|\/bdump||g’`
version=10
fi

Next, we need to look at the background_dump_dest, this value is easier to use to get the location of the database trace and log files:

DUMPNAME=`$WORKDIR/get_dump.sh`

So now our script looks like this:

WORKDIR=”$1″
export WORKDIR
ORAENV_ASK=’NO’
for ORACLE_SID in `ps -ef | grep smon|grep -v grep|awk ‘{print $8}’| awk -F “_” ‘{print$3}’`
do
. oraenv
sqlplus / as sysdba <<EOF
select instance_name from v$instance;
exit;
EOF
AUDIT_DEST=`$WORKDIR/get_audit.sh`
DIAG_BASE=`$WORKDIR/get_diag_base.sh`
DUMPNAME=`$WORKDIR/get_dump.sh`
if [[ -d "$DIAG" ]]; then
DIAG_BASE=$DIAG/rdbms/$lsid/$ORACLE_SID
DIAG_BASE=`echo $DUMPNAME | sed -e ‘s|\/trace||g’`
version=11
else
DIAG_BASE=`echo $DUMPNAME | sed -e ‘s|\/bdump||g’`
version=10
fi
done

Next, make the diag_base the working directory.

From there, we use the find command to find and delete the files.

Note that at this point you are deleting files. I highly recommend you run the segment below initially without the ‘rm –rf’ section to ensure that you do not delete files you want to keep:

find $DIAG_BASE -name \cdmp* -mtime +$daysback | xargs -i sh -c “echo deleting {}; rm -rf {}”

Now, if you followed this carefully you have everything you need to write a script that will clean up oracle log and trace files. Below is a segment of the script that will remove the .trc files:

#!/bin/sh
#set -x
PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/oracle/bin
export PATH
if [[ -n "$1" ]]; then
if [ $1 -ne 0 -o $1 -eq 0 2>/dev/null ]
then
if [[ $1 -lt 0 ]]; then
echo invalid input
exit 1
else
days=$1
echo days=$days
fi
fi
else
echo days=7
days=7
echo days=$days
fi
SERVER=`hostname -s`
FDATE=`date +%d_%m_%y`
for ORACLE_SID in `ps -ef | grep smon|grep -v grep|awk ‘{print $8}’| awk -F “_” ‘{print$3}’`
do
# uncomment 2 lines below if RAC environment and individual sids are not in oratab
# SID=`echo $ORACLE_SID | sed -e ‘s/1//g’`
# ORACLE_HOME=`cat /etc/oratab|grep ^$SID:| head -n 1 | cut -f2 -d’:’`
ORAENV_ASK=NO
export ORAENV_ASK
export ORACLE_SID
echo $ORACLE_SID
. /usr/local/bin/oraenv
echo SID=$ORACLE_SID
AUDIT_DEST=`$ORACLE_HOME/bin/sqlplus -s “/ as sysdba” <<!
set feedback off heading off verify off
select value from v\\$parameter where name=’audit_file_dest’;
!`
DIAGX=`$ORACLE_HOME/bin/sqlplus -s “/ as sysdba” <<!
set feedback off heading off verify off
select value from v\\$parameter where name=’diagnostic_dest’;
!`
DIAG=`echo $DIAGX | sed -e ‘s| *||g’`
DUMPNAME=`$ORACLE_HOME/bin/sqlplus -s “/ as sysdba” <<!
set feedback off heading off verify off
select value from v\\$parameter where name=’background_dump_dest’;
!`
export DIAG_BASE
finaud=`echo $AUDIT_DEST | sed -e ‘s|\?|’”$ORACLE_HOME”‘|’`
# used to clean up Oracle 9 ‘?’ in patch.
if [[ -d "$DIAG" ]]; then
# echo base=$DIAG
DIAG_BASE=$DIAG/rdbms/$lsid/$ORACLE_SID
DIAG_BASE=`echo $DUMPNAME | sed -e ‘s|\/trace||g’`
version=11
else
DIAG_BASE=`echo $DUMPNAME | sed -e ‘s|\/bdump||g’`
version=10
fi
echo base=$DIAG_BASE
echo audit=$finaud
/usr/bin/find $finaud -name \*.aud -mtime $days
/usr/bin/find $finaud -name *.aud -mtime $days | xargs -i ksh -c “echo deleting {}; rm {}”
cd $DIAG_BASE
cd ..
export DIAG_BASE
if [[ -d "$DIAG_BASE" ]]; then
# echo ‘diag_base=$DIAG_BASE’
cd $DIAG_BASE
/usr/bin/find $DIAG_BASE -name \*.trc -mtime $days
/usr/bin/find $DIAG_BASE -name \*.trc -mtime $days | xargs -i ksh -c “echo deleting {}; rm {}”
fi
done

Some people will gzip the trace and log files prior to deleting or clearing them.

Setting up Data Guard through OEM

November 7, 2013

One of the common tasks for an Oracle DBA is to set up an existing Oracle instance in Data Guard. Many DBA’s have struggled through the task of setting it up using many different sets of step by step instructions on the internet. However, perhaps the simplest method of setting up Data Guard is to set it up through OEM Grid Control or OEM Cloud Control.

This blog entry contains the step by step instructions for setting it up through OEM Cloud Control.

  1. Create an Oracle Home on the standby server. If the primary database uses ASM, create an ASM instance on the standby server. The version on the standby instance must be the same Oracle version as the Primary.
  2. Make sure that both the Primary instance and the standby Oracle Home and ASM instance (if required) are registered in OEM Grid Control/OEM Cloud Control
  3. In OEM Cloud Control go to the Source database page:

 

 

 

  1. Under the availability, choose ‘Add Standby Database’, then create new physical standby database, then continue.

 

  1. Choose online backup, use recovery manager (defaults), then continue.

  1. Choose the default under backup options, and enter the Primary host credentials, typically these will already be set up.

 

 

  1. Enter the standby location information.

 

 

 

  1. Enter the standby database file location information. If you use a file system, you can customize the file locations. OEM Cloud Control requires you to go to ASM when the source is ASM.

 

  1. Choose the appropriate options for Oracle restart configuration. Its fairly rare that you want automatic Oracle restart. Typically you do not want automatic restart.
  2. I prefer to use Data Guard broker. Also, OEM Cloud Control expects to use Data Guard broker, so it is probably best to use the broker.

  1. Review the information provided, and if all looks right, choose Finish.

  1. At that point, OEM will submit a job to complete the setup of the standby database.

 

You can monitor the job through grid control. Once the job is complete, you will see a Data Guard administration page under the ‘Availability’ tab and there you can check on the status of your standby.

 

 

Remember to go back and set up the standby target in OEM Cloud Control/OEM Grid Control once complete.

Large Pages Implementation in Windows

July 21, 2013

It was pointed out to me that I have not yet covered the steps for configuring hugepages for windows. Hugepages are rarely configured for windows, but it is available, and Oracle does recommending configuring it. Largepage size for windows is 16M. Nwalter Notes in comments below that in at least some windows versions, the registry key ORA_LPSIZE can be used to set the size of the large page granule.

There is a large caveat for largepages on windows. When starting the Oracle instance using hugepages, all of the hugepage memory must be contiguous. This means, for example, if the database, has an sga_target of 20g, there must not just be 20g RAM free, there must be 20g of contiguous RAM free on the server. This may mean that you have to reboot the server every time you want to restart the Oracle database.

So, the steps for largepage/hugepage implementation are as follows:

  1. Give the oracle user the lock pages in memory privilege. It does not have this privilege by default.
    1. Go to start->settings->control panel, double click on Adminstrative Tools.
    2. Double click Local Security Policy
    3. In the left pane, expand Local Policies and choose User Rights Assignment.
    4. In the right pane of the local security policy settings, select Lock Pages In Memory, choose Action->Security
    5. Click add
    6. Select the Oracle user from the name list
    7. Click add
    8. Click ok
    9. Click ok.
  2. Enable large page support for the oracle instances.
    1. Go to the ORACLE_BASE\ORACLE_HOME\bin\oracle. Key in the file system.
    2. Open oracle.key in a text editor and record the value found. This is a value set by the installer, and is normally SOFTWARE\ORACLE\KEY_{oracle_home_name}.
    3. Start the registry editor (regedit).
    4. Browse to HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\{key_home_name from step b}
    5. Create either of the two entries below, the first enables large pages for all instances, the second enables large pages for just the referenced sid:
      1. ORA_LPENABLE=1
      2. ORA_SID_LPENABLE=1 Nick Walter notes in his comments that both of these are string keys. Make sure they are created that way.
    6. Exit the registry Editor.

At this point, large pages are enabled for windows. Remember the caveat, it is possible, even likely that your server will need to be rebooted each time you want to start up the Oracle instances when large pages are enabled.

Update to utl_mail Trigger

June 28, 2013

A year or so ago I posted a trigger for adding the entry to the network ACL when granting privileges on utl_mail (here: https://dbakerber.wordpress.com/2011/06/29/11gr2-network-acl-what-a-nice-feature/ ). For some reason Oracle decided that in Oracle 11, if you grant execute on utl_mail to a user, you must also grant the user access to a network Access Control List if you want the user to be able to actually use utl_mail. I have recently updated the trigger to deal with some design problems, and also written a trigger to remove the access when the permission on utl_mail is revoked. The updated and new trigger are below:

Grant trigger:

CREATE OR REPLACE

TRIGGER capture_grant_mail after grant on database

declare

smtp_server_name varchar2(30):=’EMPTY_NAME’;

sql_text ora_name_list_t;

hosted number :=0;

grantee_list ora_name_list_t;

grantee varchar2(30);

stmt VARCHAR2(2000);

n NUMBER;

num_grantees binary_integer;

added number :=0;

created number:=0;

cluster_database varchar2(5):=’FALSE’;

begin

dbms_output.enable(20000);

begin

select value into smtp_server_name from v$parameter where name=’smtp_out_server’;

exception

when no_data_found then smtp_server_name:=’EMPTY_NAME’;

end;

select value into cluster_database from v$parameter where name=’cluster_database’;

if rtrim(ltrim(smtp_server_name)) is null or smtp_server_name=’EMPTY_NAME’

then

if cluster_database=’TRUE’

then

smtp_server_name:=’CLUSTER_NAME’;

else

select host_name into smtp_server_name from v$instance;

end if;

end if;

n := ora_sql_txt(sql_text);

for i in 1..n

loop

stmt := stmt || sql_text(i);

end loop;

if (upper(stmt) like ‘%UTL_MAIL%’)

then

num_grantees := ora_grantee(grantee_list);

grantee:=upper(grantee_list(1));

select count(1) into created from sys.dba_network_acls where acl like ‘%mail_service.xml%’;

– dbms_output.put_line(to_char(created));

if created=0

then

– dbms_output.put_line(‘Creating…’);

dbms_network_acl_admin.create_acl(acl=>’mail_service.xml’,description=>’UTL_MAIL’,principal=>grantee,is_grant=>TRUE,privilege=>’connect’);

– dbms_output.put_line(‘Created..’);

if smtp_server_name=’CLUSTER_NAME’

then

for host_cursor in (select host_name from gv$instance)

loop

dbms_network_acl_admin.assign_acl(acl=>’mail_service.xml’,host=>host_cursor.host_name);

end loop;

else

dbms_network_acl_admin.assign_acl(acl=>’mail_service.xml’,host=>smtp_server_name);

end if;

else

select count(1) into hosted from sys.dba_network_acls where acl like ‘%mail_service.xml%’ and

(host=smtp_server_name or (smtp_server_name=’CLUSTER_NAME’ and host in (select host_name from gv$instance)) or host=’*’);

if hosted=0

then

if smtp_server_name=’CLUSTER_NAME’

then

for host_cursor in (select host_name from gv$instance)

loop

dbms_network_acl_admin.assign_acl(acl=>’mail_service.xml’,host=>host_cursor.host_name);

end loop;

else

dbms_network_acl_admin.assign_acl(acl=>’mail_service.xml’,host=>smtp_server_name);

end if;

end if;

end if;

for i in 1..num_grantees

loop

grantee := upper(grantee_list(i));

select count(1) into added from dba_network_acl_privileges where principal=grantee and upper(privilege)=’RESOLVE’ and acl like ‘%mail_service.xml%’;

if added=0

then

dbms_network_acl_admin.add_privilege(acl=>’mail_service.xml’,principal=>grantee,is_grant=>true,privilege=>’resolve’);

end if;

select count(1) into added from dba_network_acl_privileges where principal=grantee and upper(privilege)=’CONNECT’ and acl like ‘%mail_service.xml%’;

if added=0

then

dbms_network_acl_admin.add_privilege(acl=>’mail_service.xml’,principal=>grantee,is_grant=>true,privilege=>’connect’);

end if;

end loop;

end if;

end;

/

 

Revoke Trigger:

CREATE OR REPLACE

TRIGGER capture_revoke_mail after revoke on database

declare

smtp_server_name varchar2(30):=’EMPTY_NAME’;

empty_list exception;

pragma exception_init(empty_list,-24246);

sql_text ora_name_list_t;

hosted number :=0;

grantee_list ora_name_list_t;

grantee varchar2(30);

stmt VARCHAR2(2000);

n NUMBER;

num_grantees binary_integer;

added number :=0;

created number:=0;

cluster_database varchar2(5):=’FALSE’;

begin

– dbms_output.enable(20000);

n := ora_sql_txt(sql_text);

for i in 1..n

loop

stmt := stmt || sql_text(i);

end loop;

if (upper(stmt) like ‘%UTL_MAIL%’)

then

dbms_output.put_line(to_char(n));

num_grantees := ora_revokee(grantee_list);

dbms_output.put_line(to_char(num_grantees));

grantee:=upper(grantee_list(1));

for i in 1..num_grantees

loop

begin

– dbms_output.put_line(‘In loop:’);

grantee := upper(grantee_list(i));

select count(1) into added from dba_network_acl_privileges where principal=grantee and upper(privilege)=’RESOLVE’ and acl like ‘%mail_service.xml%’;

if added=0

then

dbms_output.put_line(‘User does not have resolve privilege, not revoked=’||grantee);

else

dbms_network_acl_admin.delete_privilege(acl=>’mail_service.xml’,principal=>grantee,is_grant=>true,privilege=>’resolve’);

end if;

select count(1) into added from dba_network_acl_privileges where principal=grantee and upper(privilege)=’CONNECT’ and acl like ‘%mail_service.xml%’;

if added=0

then

dbms_output.put_line(‘User does not have connect privilege, not revoked=’||grantee);

else

dbms_network_acl_admin.delete_privilege(acl=>’mail_service.xml’,principal=>grantee,is_grant=>true,privilege=>’connect’);

end if;

exception

when empty_list

then

dbms_network_acl_admin.drop_acl(acl=> ‘mail_service.xml’);

continue;

when others then dbms_output.put_line(SQLERRM);

raise;

end;

end loop;

end if;

end;

/

Update on Hugepages Post

June 28, 2013

** UPDATED on Feb 14, 2014 **As an update on the hugepages post, I have researched the requirements for enabling the equivalent of Hugepages in AIX.

AIX uses the term largepages when referring to the closest equivalent of lInux hugepages. AIX large pages are 16mb in size, and are implemented differently than Linux hugepages. In fact, my research showed that of all operating systems I generally work with (Linux, Windows, AIX, and sun-sparc) Linux is the most difficult to implement.

Solaris will implement large pages automatically using ism (intimate shared memory). Nothing special needs to be done to do this.

Enabling Large Pages in AIX

In AIX, large pages are implemented as described below. In AIX and Solaris, as in Linux, largepages are not swappable and the full amount is allocated when the instance starts:

  1. Calculate the total size of all SGA’s you will need on the AIX server. This will be the SGA max size, since all memory will be allocated on startup. So if you have 3 instances, with SGA max size of 32G, 8G, and 4G, the total size is 44G. Add in a fudge factor of 2G to be safe, for a total of 46G.
  2. Divide the total SGA in bytes by 16777216 (the AIX large page size) to get the total number of large page segments and add 1:

    (46*1024*1024*1024)/(16777216)=2944+1=2945. So 2945 is the total number of large pages required.

  3. Use vmo to allocate the largepages. You will need to allocate them to take effect immediately, and to persist after reboot, so the vmo command is run twice:

    vmo –o lgpg_regions=2945 –o lgpg_size=16777216

    vmo –r –o lgpg_regions=2945 –o lgpg_size=16777216

  4. Grant the database installation owner (typically oracle) the required privileges to use large pages and to pin memory:
    # Show pinshm settings:
    vmo -o v_pinshm
    v_pinshm = 0

    # set pinshm to 1 to allow shared memory to be pinned
    vmo -o v_pinshm=1
    vmo -r -o v_pinshm #persist change

    chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE oracle
    NOTE: after setting these, the oracle user will have to log out and back in for them to take effect.

  5. In the Oracle database, set sga_max_size and sga_target to the same value, set lock_sga=true, and restart the instance eg:

    Alter system set sga_max_size=16g scope=spfile sid=’*’;
    Alter system set sga_target=16g scope=spfile sid=’*’;
    Alter system set lock_sga=true scope=spfile sid=’*’;
    Shutdown immediate;
    Startup;

To check the status of huge usage, run this command:

vmstat -P all
For 16M hugepages, look at the 16m row
System configuration: mem=61440MB

pgsz memory page
—– ————————– ————————————
siz avm fre re pi po fr sr cy
4K 5986176 1095680 61570 0 0 0 92 153 0
64K 35208 34938 270 0 0 0 0 0 0
16M 2241 2079 162 0 0 0 0 0 0

Addendum to Hugepages in Linux

Oracle has added a new ‘feature’ to OEL 6, called transparent hugepages. This is also in RHEL6 and SLES11 (Suse Linux). Oracle note 1557478.1 says that transparent hugepages should be disabled on servers running Oracle database. According to the note, the ‘feature’ can cause node reboots and performance problems. Transparent hugepages are allocated and deallocated like normal memory, so most likely the performance issues come from excessive CPU usage while managing the hugepages. In order to disable transparent hugepages, do the following:

  1. Verify that transparent hugepages are enabled.

    cat /sys/kernel/mm/transparent_hugepages/enabled

    [always] never

    If the always is bracketed above, hugepages are enabled.

    Alternate method:

    # grep /AnonHugePages /proc/meminfo

    If this returns a value > 0kB, the kernel is uses Transparent HugePages

  2. Preferred method to disable hugepages:

    Add transparent_hugepage=never to /etc/grub.conf, and reboot the server, eg:

    title Oracle Linux Server (2.6.32-300.25.1.el6uek.x86_64)
    root (hd0,0)
    kernel /vmlinuz-2.6.32-300.25.1.el6uek.x86_64 ro root=LABEL=/ transparent_hugepage=never
    initrd /initramfs-2.6.32-300.25.1.el6uek.x86_64.img

  3. Alternate method, disable in rc.local and reboot the server. Add the following section to rc.local and reboot:

    if test -f /sys/kernel/mm/transparent_hugepage/enabled; then

    echo never > /sys/kernel/mm/transparent_hugepage/enabled

    fi

    if test -f /sys/kernel/mm/transparent_hugepage/defrag; then

    echo never > /sys/kernel/mm/transparent_hugepage/defrag

    fi

Common Largepages/Hugepages settings
Linux
Total RAM RAM for Hugepages (G) vm.nr_hugepages Ram For Hugepages(Bytes) kernel.shmmax kernel.shmall oracle soft memlock oracle hard memlock
16

8

4101

8,589,934,593

9663676416

2359296

15461882266

15461882266

24

12

6149

12,884,901,889

13958643712

3407872

23192823398

23192823398

32

16

8197

17,179,869,185

18253611008

4456448

30923764531

30923764531

48

24

12293

25,769,803,777

26843545600

6553600

46385646797

46385646797

64

32

16389

34,359,738,369

35433480192

8650752

61847529062

61847529062

96

48

24581

51,539,607,553

52613349376

12845056

92771293594

92771293594

128

64

32773

68,719,476,737

69793218560

17039360

123695058125

123695058125

256

128

65541

137,438,953,473

138512695296

33816576

247390116250

247390116250

AIX
RAM for Large pages Number Large Pages Ram for Large pages (bytes) Large page size vmo command vmo command to persist setting after bounce
16

8

512

8,589,934,592

16777216

vmo -o lgpg_size=16777216 -o lgpg_regions=512 vmo -r -o lgpg_size=16777216 -o lgpg_regions=512
24

12

768

12,884,901,888

16777216

vmo -o lgpg_size=16777216 -o lgpg_regions=768 vmo -r -o lgpg_size=16777216 -o lgpg_regions=768
32

16

1024

17,179,869,184

16777216

vmo -o lgpg_size=16777216 -o lgpg_regions=1024 vmo -r -o lgpg_size=16777216 -o lgpg_regions=1024
48

24

1536

25,769,803,776

16777216

vmo -o lgpg_size=16777216 -o lgpg_regions=1536 vmo -r -o lgpg_size=16777216 -o lgpg_regions=1536
64

32

2048

34,359,738,368

16777216

vmo -o lgpg_size=16777216 -o lgpg_regions=2048 vmo -r -o lgpg_size=16777216 -o lgpg_regions=2048
96

48

3072

51,539,607,552

16777216

vmo -o lgpg_size=16777216 -o lgpg_regions=3072 vmo -r -o lgpg_size=16777216 -o lgpg_regions=3072
128

64

4096

68,719,476,736

16777216

vmo -o lgpg_size=16777216 -o lgpg_regions=4096 vmo -r -o lgpg_size=16777216 -o lgpg_regions=4096
256

128

8192

137,438,953,472

16777216

vmo -o lgpg_size=16777216 -o lgpg_regions=8192 vmo -r -o lgpg_size=16777216 -o lgpg_regions=8192

Adding Space to ASM Using ASMLIB

April 10, 2013

Recently I discovered that there are very few places where the process of adding new storage to ASM is documented. Many Oracle users evidently think that Oracle must be restarted in order to add storage to ASM. This is actually only true if it is necessary to restart the Oracle server or servers (in a cluster) in order for the new LUN to be recognized. On most servers these days, it is not necessary. These are the steps to adding a new LUN to ASM using ASMLIB.

These instructions assume we are working on a multi-node RAC system. They do not cover the steps required to get the LUN added to the operating system as that can vary by the server environment.

  1. Have the storage or systems group allocate a new LUN to the RAC nodes. Ensure this is a single LUN visible to both nodes. If possible, It should appear in the same location on both servers, ie. if it is at /dev/sdi on node 1 it should show as /dev/sdi on node 2. The disk placement is not absolutely critical when using ASMLIB, but I still recommend it. If the disks are mounted in different paths on the two nodes it becomes much easier to make mistakes in managing the disks. If you do not use ASMLIB, you should use udev or other method of ensuring that the same disk always comes up in the same location, even after adding additional storage.

    NOTE: Do not use thin provisioning to allocate this LUN!

  2. Create a single primary partition containing the entire LUN and make sure it is visible on both RAC nodes.
  3. As root, on the first node of the cluster only, use the oracleasm program to allocate the disk to oracle. The command is ‘/etc/init.d/oracleasm createdisk <disk name> <path>’,

    eg: ‘# /etc/init.d/oracleasm createdisk DATA03 /dev/sdi1′

  4. Use oracleasm as oracle to verify that the disk is available: # /etc/init.d/oracleasm listdisks. The output should show all asm disks, including the new one:

    oracle (+ASM2)$ /etc/init.d/oracleasm listdisks

    DATA01

    DATA02

    DATA03

    FRA01

    GRID01

    GRID02

    GRID03

    GRID04

  5. On all other nodes of the cluster, as root, scan for the new oracle disk:

    [root@kerbrac01 ~]# /etc/init.d/oracleasm scandisks

    Scanning the system for Oracle ASMLib disks: [ OK ]

  6. After running scandisk on the other nodes, run listdisks as oracle on each node to verify that the new disk is visible:

    oracle (+ASM1)$ /etc/init.d/oracleasm listdisks

    DATA01

    DATA02

    DATA03

    FRA01

    GRID01

    GRID02

    GRID03

    GRID04

  7. After verifying that the new disk is visible on all nodes, go back to the first node, connect as Oracle. Then set the environment to the ASM instance, and add the disk to the ASM diskgroup.

    oracle (racdb1)$ . oraenv

    ORACLE_SID = [racdb1] ? +ASM1

    The Oracle base has been changed from /opt/oracle to /u01/oracle

    kerbrac01:/home/oracle

    oracle (+ASM1)$ sqlplus / as sysasm

    SQL> alter diskgroup DATA01 add disk ‘DAT03′;

    Diskgroup altered

    SQL>

  8. After the disk is added, ASM will balance data in the diskgroup across all the disks in the diskgroup so each disk has about the same amount of data. This operation is called rebalancing. The rebalance process can be viewed by select from the v$asm_operation view:

    SQL> Select * from v$asm_operation;

    SQL> /

     

    GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE

    ———— —– —- ———- ———- ———- ———- ———- ———– ——————————————–

    1 REBAL RUN 5 5 267503 338682 5439 13

  9. In order to speed up the process, increase the rebalance power. This setting defines how many resources the rebalance process will consume, the maximum value is 11 in oracle 10.2, and 1024 in Oracle 11. To increase the rebalance power, the command is ‘alter diskgroup <diskgroup name> rebalance power <power number>;, eg:

    SQL> alter diskgroup data01 rebalance power 5;

  10. Once the rebalance is complete, the entire space added will be available. More space becomes available as the rebalance operation runs, so the entire space added will not be available immediately.
  11. As a general rule, it is not a good practice to add a new disk to ASM and drop a disk at the same time. If you are going to remove a LUN from ASM, wait for the first rebalance operation to complete, then drop the old disk, and verify that the rebalance operation is complete before attempting further operations.

This Is Really Not Good Oracle Application (but it does work)

January 16, 2013

I ran into another interesting problem some time ago that I decided to write about just recently. This problem is interesting because it is a great example of an Oracle application that should not use Oracle at all. It also makes it clear that too many programmers these days don’t know much about anything outside of their favorite computer language, which is almost always Java. On the other hand, it also shows that Oracle can be used in applications that it is really unsuited for, if you throw enough hardware at the problem.

So, here is the application: we have a large company that is taking transactions from many different sources at the same time, hundreds of thousands of transactions. The application is not actually auditing the transactions, all it is doing is counting the transactions every minute then displaying a bar graph to a dashboard, along with some other statistics. In other words, it’s a very basic dashboard application.

The process is this: Every transaction that comes in is written to an Oracle table with some basic information (none of which is privacy type identify information). The information includes the time, type, and location of the transaction, and the time it took to run the transaction. At the end of each minute, a query is run to retrieve from the Oracle table (qualifying on the minute) to count and display that information on a dashboard. Then some basic statistical information is calculated. The minute by minute transaction information is saved for six days, then deleted.

Obviously, this really isn’t a good use for Oracle. It would make much more sense to count this data as it comes in and send it to the dashboard at the end of each minute, and only writing the summary information to the Oracle database. Writing individual transactions to the database for no other reason than displaying counts to a dashboard is rather pointless. This method being has thousands more writes per minute than are really needed, and one more read per minute. This entire operation could be done much more efficiently using a traditional programming language such as FORTRAN or C, or even PASCAL or ADA. I suspect even a good Perl scripter could produce something that would accomplish what needs to be done much more effectively. All that is really needed is to count the required information as it comes in, add the totals, write the totals to the database, send the summary to the dashboard, and dump the data. For this type of application, there is no good reason at all to use the Oracle database to store the individual transactions, with all the overhead of maintaining read and write consistency. From the start, this appeared to be a classic circular queue design that could be handled very well with a 3rd generation language, and with much less overhead.

What was produced was a purely database application written in Java that leverages none of the inherent advantages of a third generation language, and as a consequence requires a server of tremendous power to handle the load. The server has 256G RAM with 128 CPU’s (not sure how that breaks out in physical cores).

Originally, the data was to be stored in a standard Oracle table, un-partitioned, with a regularly scheduled delete process to delete the old data. Needless to say, this resulted in continuously growing table, large gaps in the indices, and gradually deteriorating performance. Eventually, the design team got hold of a DBA to try and figure out what was going on with their performance. Needless to say, they should have had one involved from the start, instead of bringing in one at the end, a week before they planned to move to production.

Anyway, after some thought, and discussion, it was pretty clear that their table design was about the worst possible for this sort of application (and it was too late to get the out of the Oracle database entirely), so I designed what I hoped would get the best possible performance from this application. What I designed was a classic circular queue that uses Oracle partitioning. The partitioning keys are day of week and minute of day.

The DDL statement (column names and such have been changed, and some columns are left off) is below, I also left of the create index statement on the TIME_IN column:
Create table dbakerber.tranlog_table
(time_in timestamp default systimestamp not null,
Application_id number(2) not null,
Transaction_type_id number(2),
Transaction_id varchar2(250),
Transaction_start_time timestamp with time zone,
Elapsed_mils number(6),
Day_of_week number(2) generated always as (to_number(to_char(time_in,’d’))),
Min_of_day number(5) generated always as (60* extract(hour from time_in)+extract(minute from time_in))
) tablespace tranlog
Partition by list(day_of_week)
Subpartition by range (min_of_day)
(partition part_Sunday values (1)
(subpartition partsun_1 values less than (1) tablespace tranlog_part_sun,
subpartition partsun_2 values less than (2) tablespace tranlog_part_sun,
subpartition partsun_3 values less than (3) tablespace tranlog_part_sun,
subpartition partsun_4 values less than (4) tablespace tranlog_part_sun,
subpartition partsun_5 values less than (5) tablespace tranlog_part_sun,
.
.
subpartition partsun_1440 values less than (1440) tablespace tranlog_part,
subpartition partsunbad_values values less than (maxvalue) tablespace tranlog_part
),
Partition part_Monday values (2)
( subpartition partmon_1 values less than (1) tablespace tranlog_part_mon,
subpartition partmon_2 values less than (2) tablespace tranlog_part_mon,
.
.
.)
Remainder of create table left off for the sake of space.

We then wrote a stored procedure for truncating the partitions:

CREATE OR REPLACE
procedure del_old_partitions(trunc_date in date default sysdate-6)
is
comline varchar2(400);
part_name varchar2(30);
part_day varchar2(30);
p_name varchar2(60);
begin
dbms_output.enable(1000);
part_day:=to_char(trunc_date,’DAY’);
part_name:=’PART_’||part_day;
comline:=’alter table dbakerber.tranlog truncate partition ‘||part_name||’ update indexes’;
– dbms_output.put_line(comline);
execute immediate comline;
exception when others then
begin
raise;
end;
end;

We ran into an application problem whereby for some reason it was not killing off its old processes, thus leaving locks on the partitions when it came time to truncate the partition. So we wrote a shell script to go out and kill the locking processes prior to the partition truncation, and this is what we ended up with:

#!/bin/ksh
#
#############################################################################
export ORAENV_ASK=NO
export ORACLE_SID=ORASID1
export ORACLE_HOME=/oracle/product/11.2.0/db_1
export PATH=$ORACLE_HOME/bin:$PATH

sqlplus “/ as sysdba” <<EOF
set verify off echo off pages 0 lines 300
set termout off feedback off heading off
spool /tmp/killprocs.ksh
select
'#!/bin/ksh'
from dual;
SELECT distinct
'ssh '||host_name||' "'||'kill -9 '||p.spid||'"'
FROM gv\$locked_object v,
dba_objects d,
gv\$lock l,
gv\$session s,
gv\$process p,
gv\$instance i
WHERE v.object_id = d.object_id
AND (v.object_id = l.id1)
AND v.session_id = s.sid
and s.paddr=p.addr
and owner='DBAKERBER'
and object_name='TANLOG'
and logon_time < sysdate-1
and object_type='TABLE PARTITION'
and s.inst_id=i.inst_id;
exit;
EOF
chmod +x /tmp/killprocs.ksh
sh /tmp/killprocs.ksh

sqlplus "/ as sysdba" <128,cascade=>TRUE);
exit
EOF

wcount=`grep ORA- /tmp/clean_old_partitions.log | wc -l`
if [ $wcount -gt 0 ]; then
mailx -s ‘Error in **PROD*** Partition Cleanup Script’ myemailaddress < /tmp/clean_old_partitions.log
Fi

So, we finished up with this application, and the performance is quite acceptable. However, some interesting notes. This is running in a two node cluster in Oracle dataguard to another two node cluster, at a remote site. The redo logs are sized at 1g, and even at that size we are switching logs about every 1.5 minutes. So, think about this: They don’t need to keep this data, they don’t update the data, the data is only used to update a dashboard, and its thrown away after a week. But they are using 4 nodes, each with 256G RAM and 128 CPU’s to keep up with the data, and the SAN behind it has to be pretty good also. Considering the amount of network traffic involved, they probably do need some high quality network hardware also.

After some thought, I strongly suspect that if this were written in a third generation language using flat files, this entire process could be run on four commodity nodes at about ten percent of the cost, and that assumes they really do need a primary and a standby cluster for a dashboard.

It has been a while since I did anything in Pascal or Ada, but it might be an interesting project.

Windows 8, Not Good for the Professional

January 1, 2013

I recently upgraded my personal Windows 7 laptop to Windows 8.  At this point, I have to say that Windows 8 is as step backward for the IT professional.

As an IT professional, I use a large number of applications, none of which are usable on touch screen, such as MS Word, Putty, Winscp, Oracle SQLPlus, etc.  I expect to access all of these through a start menu, since the large number of applications that I use makes putting icons on the desktop too cumbersome.  However, Windows 8 does not offer a start menu, only a scrollable touchscreen.  This is extremely cumbersome. I was able to find and install a free application which set up a start menu very similar to the Windows 7 start menu which I could access from my desktop.  However, to access the desktop I have to go through the Metro screen and click on desktop.  This is an extra step that is very annoying, and according to all I have read, Microsoft has the ability to make this step unnecessary, but has disabled the ability to the general public.

All in all, I find Windows 8 to be not nearly as usable as Windows 7 for the IT professional.

Configuring Hugepages For Oracle on Linux

March 14, 2012

NOTE: I have recently discovered that Oracle, hugepages, and NUMA are incompatible, at least on Linux. NUMA must be disabled to use hugepages with Oracle.

RAM is managed in 4k pages in 64 bit Linux.  When memory sizes were limited, and systems with more than 16G RAM were rare, this was not a problem.  However, as systems get more memory, the number of memory pages increased and become less manageable.  Hugepages make managing the large amounts of memory available in modern servers much less CPU intensive.  In particular, with the number of memory pages reduced by typically three orders of magnitude, the chance that a particular page pointer will be available in the processor cache goes up dramatically. 

First some caveats on using hugepages:   Hugepages are not swappable, thus Oracle SGA memory must either be all hugepages are no hugepages.   If you allocate hugepages for Oracle, and don’t allocate enough for the entire SGA, Oracle will not use any hugepage memory.    If there is not enough non-hugepage memory, your database will not start.  Finally, enabling hugepages will require a server restart, so if you do not have the ability to restart your server, do not attempt to enable hugepages.

Oracle Metalink note 1134002.1 says explicitly that AMM (MEMORY_TARGET/MEMORY_MAX_TARGET) is incompatible with hugepages.  However, I have found at least one blog that says that AMM is compatible with hugepages when using the USE_LARGE_PAGES parameter in 11g (where AMM is available).  Until further confirmation is found, I do not recommend trying to combine hugepages with MEMORY_TARGET/MEMORY_MAX_TARGET. 

There are both Oracle database settings and Linux OS settings that must be adjusted in order to enable hugepages.  The Linux and oracle settings of concern are below:

Linux OS settings:

/etc/sysctl.conf:

vm.nr_hugepages

kernel.shmmax

kernel.shmall

 /etc/security/limits.conf:

 oracle soft memlock

oracle hard memlock

 

Oracle Database spfile/init.ora:

 SGA_TARGET

SGA_MAX_SIZE

MEMORY_TARGET

MEMORY_MAX_TARGET

USE_LARGE_PAGES

First, calculate the Linux OS settings.  Kernel.shmmax should be set to the size of the largest SGA_TARGET on the server plus 1G, to account for other processes.  For a single instance with 180G RAM, that would be 181G.

 Kernel.shmall should be set to the sum of the SGA_TARGET values divided by the pagesize.  Use ‘getconf pagesize’ command to get the page size.  Units are bytes.  The standard pagesize on Linux x86_64 is 4096, or 4k.

Oracle soft memlock and oracle hard memlock should be set to slightly less than the total memory on the server, I chose 230G.  Units are kbytes, so the number is 230000000.  This is the total amount of memory Oracle is allowed to lock.

Now for the hugepage setting itself: vm.nr_hugepages is the total number of hugepages to be allocated on the system.  The number of hugepages required can be determined by finding the maximum amount of SGA memory expected to be used by the system (the SGA_MAX_SIZE value normally, or the sum of them on a server with multiple instances) and dividing it by the size of the hugepages,  2048k, or 2M on Linux.  To account for Oracle process overhead, add five more hugepages .  So, if we want to allow 180G of hugepages, we would use this equation:  (180*1024*1024/2048)+5.  This gives us 92165 hugepages for 180G.  Note: I took a shortcut in this calculation, by using memory in MEG rather than the full page size.  To calculate the number in the way I initial described, the equation would be:  (180*1024*1024*1024)/(2048*1024).

In order to allow the Oracle database to use up to 180G for the SGA_TARGET/SGA_MAX_SIZE, below are the settings we would use for the OS:

 

/etc/security/limits.conf

 oracle soft memlock 230000000

oracle hard memlock 230000000

 

/etc/sysctl.conf

 

vm.nr_hugepages =  92165

kernel.shmmax  = 193273528320+1g = 194347270144

kernel.shmall  = 47448064

 

In the Oracle database there is a new setting in 11gR2.  This is USE_LARGE_PAGES, with possible values of ‘true’, ‘only’, and ‘false’.  True is the default and current behavior, ‘False’ means never use hugepages, use only small pages.  ‘Only’ forces the database to use hugepages.  If insufficient pages are available the instance will not start.  Regardless of this setting, it must use either all hugepages or all smallpages.  According to some blogs, using this setting is what allows the MEMORY_MAX_TARGET and MEMORY_TARGET to be used with hugepages.  As I noted above, I have not verified this with a Metalink note as yet.

Next, set SGA_TARGET and SGA_MAX_SIZE to the desired size.  I generally recommend setting both to the same size.  Oracle recommends explicitly setting the MEMORY_TARGET and MEMORY_MAX_TARGET to 0 when enabling hugepages.  So these are the values in the spfile that we change:

USE_LARGE_PAGES=only

SGA_TARGET=180G

SGA_MAX_SIZE=180G

MEMORY_MAX_TARGET=0

MEMORY_TARGET=0

 

In order to verify that hugepages are being used, run this command:

‘cat /proc/meminfo | grep Huge’.

 It will show HugePages_Total, HugePages_Free, and HugePages_Rsvd.  The HugePages_Rsvd value is the number of hugepages that are in use..

Note that this example uses Linux hugepage size of 2M (2048k).  On Itanium systems the hugepage size is 256M.

These instructions should allow you successfully implement huge pages in Linux.  Note that everything would be the same for Oracle 10gR2, with the exception that the USE_LARGE_PAGES parameter is unavailable.

Incremental Recovery of Standby (ASM and RMAN)

December 20, 2011

 I had another interesting recovery operation recently.  A company I was working with had a very large (~4TB) database generating anywhere from several hundred MB to over a TB of archive logs each day.  They had RMAN backups configured, but due to a slight misconfiguration of their backups, they accidentally deleted an archive log that they needed.  As consequence, they had a gap in their standby, and they did not have a backup of the required archive log.  They had also created additional data files in the time that the standby was out of synch.

In the past, I had always recovered this situation by a complete re-instantiation of the standby, which we really did not want to do because of the very large size of the source database.  Therefore, I was happy to have someone mention this process of recovery using an incremental backup from SCN.  This is the process I used.  After recovering the customer, I went back and refined the process and identified some unneeded steps.

First I created a primary and a standby database in VMware workstation.  The primary was called dgsrc, the standby is dgsrcsb.  I then created a table I called mytest_table, and inserted some rows in dgsrc.  I then opened the standby in read only and verified that the table was in the standby.  At that point I shut down the standby and its listener.

Then I inserted a large number of rows in the primary, forcing a log switch after each large insert.  I then added a datafile to the primary.

Next, I started rman and deleted several archive logs so they could not be transported.  I then started up the standby and verified that there was a gap:

FAL[client]: Failed to request gap sequence
 GAP – thread 1 sequence 85-86
 DBID 586313788 branch 769276415
FAL[client]: All defined FAL servers have been attempted.
————————————————————
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that’s sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.

After verifying the gap, I added a datafile to the primary, and inserted more rows into mytest_table.  I then took a count of the rows in the table:

SQL> select count(1) from akerber.mytest_table;

  COUNT(1)
———-
   3809968

I then proceeded to attempt the recovery process as documented below. The basic reference document for this process is Metalink id #836986.1.  However, this document does not cover the steps to add missing datafiles, and also contains unnecessary steps.  The steps below were validated on both a database using a file system, and a database using ASM.  Also, in order for the process below to work properly, the parameters DB_FILE_NAME_CONVERT and LOG_FILE_NAME_CONVERT must be set correctly on the standby.

The basic steps are as follows:

  1.  Shutdown the standby database.
  2. Identify the last applied SCN on the standby.
  3. Make a list of files currently on the primary and the standby.
  4. Run an incremental backup from SCN, and create a standby controlfile, both on the primary database.
  5. Move the backup files and the backup controlfile to the standby server.
  6. Restore the standby controlfile to the standby.
  7. Create missing datafiles.
  8. Switch datafiles to copy (required whether or not new files are created).
  9. Recover using the incremental SCN backup.
  10. Restart the standby.

The steps in detail are below.  This is a much quicker method to re-instantiate the standby database than I have used in the past.  I have tried it out on databases running on both file systems and ASM.

Step 1:   Once you have determined you have a log gap, and do not have a copy of the archivelog to register, cancel the recovery process on the standby:

 

‘alter database recover managed standby database cancel;’

 

Step 2:  On the STANDBY, run these commands (NOTE: the CURRENT_SCN can be a very large number, and  if you do not use the format command,  the output may be an unreadable exponential form):

 

Column current_scn format ‘999999999999999’;

Select current_scn from v$database;

select min(fhscn) current_scn from x$kcvfh;

 

Use the lesser of the two numbers above, and then subtract 1000 to for the SCN in the incremental backup operation.  Note that the lesser of the two numbers above should be sufficient, but choosing an earlier SCN cannot hurt, and will add a safety margin.

 

Step 3:  Check the number and creation date of datafiles on the STANDBY:

 

select count(1), max(file#), to_char(max(creation_time),’dd-Mon-yyyy hh24mi’) from v$datafile;.

 

Run the same query on the primary database, and compare the two.

 

 If the number of files are different on the two queries, or the creation dates are substantially different, save a listing of datafiles from both the standby and the primary. Use the query below to get the listing.  Run it on both the primary and standby:

 

select file#, name from v$datafile;

 

Step 4:  On the PRIMARY start RMAN, and run the backup command: 

run {

allocate channel t1 device type disk;

allocate channel t2 device type disk;

allocate channel t3 device type disk;

allocate channel t4 device type disk;

BACKUP as compressed backupset INCREMENTAL FROM SCN <scn from step 1 goes here>

DATABASE FORMAT ‘<path to backup destination>/filename_%U’;

}

Eg:

RMAN> run {

allocate channel t1 device type disk;

allocate channel t2 device type disk;

allocate channel t3 device type disk;

allocate channel t4 device type disk;

BACKUP as compressed backupset INCREMENTAL FROM SCN 604900565

DATABASE FORMAT ‘/home/oracle/bkup/orc_%U.bak’;

}

 

Backup the current control file on the primary for the standby.  Note that this command can also be included as part of the backup script above.  If you choose to do this, make sure it is run after the incremental backup: 

 

RMAN> backup current controlfile for standby format ‘/home/oracle/bkup/control01.ctl’;

 

Step 5:  Copy the standby controlfile backup and the incremental backup files to the standby server, using SCP or FTP.    It saves time to store the Incremental backup and controlfile backup in a location by the same name on the standby server.

 

Step 6:  Shutdown the STANDBY, and start it up in nomount mode, and restore the standby controlfile.  Remember to use the STANDBY keyword so that Oracle understands that this is a standby controlfile that is being restored:

RMAN> Shutdown immediate;

RMAN> startup nomount;

RMAN> restore standby controlfile from ‘/home/oracle/bkup/control01.ctl’;

RMAN> alter database mount;

 

Catalog the backup files on the standby.  This is not necessary if the files are in the same location as on the primary:

RMAN> catalog start with ‘directory containing files’; E.g.; RMAN> catalog start with ‘/home/oracle/bkup’;

 

RMAN will ask you if you really want to catalog the files, enter YES to catalog the files.

 

Now switch to SQLPLUS and compare the list of the files from v$datafile with the list you saved earlier from the standby: 

      SQL> select file#, name from v$datafile;

 

Step 7:  If any datafiles are missing, create them using these commands:

SQL> alter system set standby_file_management=manual scope=memory;

SQL> alter database create datafile <file#>;

Eg: SQL> alter database create datafile 12;

 

After all datafiles are created, run this command:

SQL> alter system set standby_file_management=auto scope=memory;

 

Step 8:  Switch back to RMAN and switch to the new data files (note this step is required whether or not new files are created.  It actually directs the control file to the correct set of datafiles):

 

RMAN> switch database to copy;

 

The ‘switch database to copy’ command will force a clear of the redo logs.   If you are monitoring the alert log you will see a series of errors about failed to open online log.  These can be ignored.

 

NOTE:  I did not test this aspect, however I am fairly confident that ff the DB_FILE_NAME_CONVERT  and LOG_FILE_NAME_CONVERT parameters are not set, the CATALOG command must be run for each diskgroup (or directory) on the STANDBY containing datafiles prior to the ‘switch database to copy’ command.

eg:

 

RMAN> catalog start with ‘+DATA1/orcl/datafile’;

RMAN> catalog start with ‘+DATA2/orcl/datafile’;

 

Once again, after each catalog command you will be asked if you want to catalog the files, enter YES.  After all catalog commands are complete, run the command:

RMAN> switch database to copy;

 

Step 9:  Recover the standby database.  This is done with the command below:

                      RMAN> recover database noredo;

The NOREDO option forces RMAN to use a backup rather than the archive logs, thus bypassing the gaps.  This apply process will often take substantial time. 

 

Step 10: After the recovery process is complete,  resume the recovery process:

 

                 SQL> alter database recover managed standby database disconnect   from session;

 

Monitor the alert log on the standby to verify that logs are applying.   Entries in the alert log beginning with ‘Media Recovery Log <file name>’ indicate that logs are being applied.

‘Archived Log entry <number> added for thread <#> sequence <#> ‘ indicates that logs are being received.   A sample from a running standby  is below:

 

RFS[1]: Selected log 7 for thread 1 sequence 102 dbid 586313788 branch 769276415

Sun Dec 18 13:25:31 2011

Archived Log entry 35 added for thread 1 sequence 101 ID 0x22f2f73c dest 1:

Sun Dec 18 13:25:33 2011

Media Recovery Log +DATA1X/dgsrc1/archivelog/2011_12_18/thread_1_seq_101.474.770217929

Media Recovery Waiting for thread 1 sequence 102 (in transit)

The last line indicates that the standby has applied the previous log and is waiting for the next log (102) to arrive so that it can apply that log.

 

To verify that the process was successful, I opened the standby in read only mode, and counted the rows in mytest_table.

 

SQL> select count(1) from akerber.mytest_table;

  COUNT(1)
———-
   3809968

The recovery process was successful.


Follow

Get every new post delivered to your Inbox.