Testing the ASM Filter Driver (AFD)

November 25, 2014

Testing the Oracle ASM Filter Driver (AFD)
In my previous post I converted from standard Oracle ASM to the ASM Filter Driver (AFD). According to Oracle, the ASM Filter Driver will prevent non-oracle software from updating the AFD devices. Let’s test that out.
First we try this as the owner of the GI installation grid:
[grid@vmrh65node1 ~]$ dd if=/dev/zero of=/dev/sdf1
dd: opening `/dev/sdf1′: Permission denied

Ok, that’s pretty straight forward. Grid does not have permission to modify the device. Let’s see what happens when we connect as root and do it:

[grid@vmrh65node1 ~]$ asmcmd afd_lsdsk
——————————————————————————–
Label Filtering Path
================================================================================
OCR1 ENABLED /dev/sdc1
OCR2 ENABLED /dev/sdd1
OCR3 ENABLED /dev/sde1
DATA ENABLED /dev/sdf1
RECO ENABLED /dev/sdg1

[root@vmrh65node1 ~]# cp oracleasmlib-2.0.4-1.el6.x86_64.rpm /dev/oracleafd/disks/DATA
cp: overwrite `/dev/oracleafd/disks/DATA’? y

Hmm. That’s not so good. What does the OS say?

[root@vmrh65node1 ~]# ls -la /dev/oracleafd/disks
total 32
drwxrwx— 2 grid dba 140 Nov 24 20:41 .
drwxrwx— 3 grid dba 80 Nov 24 20:41 ..
-rw-r–r– 1 root root 13300 Nov 25 12:52 DATA
-rw-r–r– 1 root root 10 Nov 24 20:41 OCR1
-rw-r–r– 1 root root 10 Nov 24 20:41 OCR2
-rw-r–r– 1 root root 10 Nov 24 20:41 OCR3
-rw-r–r– 1 root root 10 Nov 24 20:41 RECO

That doesnt look good. What does Oracle say?

[grid@vmrh65node1 asm]$ asmcmd afd_lsdsk
——————————————————————————–
Label Filtering Path
================================================================================
OCR1 ENABLED /dev/sdc1
OCR2 ENABLED /dev/sdd1
OCR3 ENABLED /dev/sde1
DATA ENABLED
RECO ENABLED /dev/sdg1

Ok. That’s not promising. But maybe there is something going on that I don’t know about. Is the database still up?

[root@vmrh65node1 ~]# ps -ef | grep pmon
grid 3288 1 0 Nov24 ? 00:00:06 asm_pmon_+ASM1
oracle 3825 1 0 Nov24 ? 00:00:13 ora_pmon_o11rhawd1
root 8379 7214 0 12:52 pts/0 00:00:00 grep pmon

Well, that’s interesting. It does look like the database is still up. Lets try a log in.

[root@vmrh65node1 ~]# su – oracle
[oracle@vmrh65node1 ~]$ . oraenv
ORACLE_SID = [oracle] ? o11rhawd1
The Oracle base has been set to /u01/app/oracle
[oracle@vmrh65node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Nov 25 12:53:14 2014

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 4275781632 bytes
Fixed Size 2260088 bytes
Variable Size 939525000 bytes
Database Buffers 3321888768 bytes
Redo Buffers 12107776 bytes
Database mounted.
Database opened.
SQL> show parameter background

NAME TYPE VALUE
———————————— ———– ——————————
background_core_dump string partial
background_dump_dest string /u01/app/oracle/diag/rdbms/o11
rhawd/o11rhawd1/trace
SQL> exit

That looks very good. As we know, Oracle won’t start if it has problems with the system tablespace, and on this instance the system tablespace is in the DATA disk group.

Let’s reboot and see what Oracle says.

[root@vmrh65node1 ~]# reboot

Broadcast message from oracle@vmrh65node1.awddev.dstcorp.net
(/dev/pts/0) at 18:26 …

The system is going down for reboot NOW!
[root@vmrh65node1 ~]#
login as: oracle
oracle@10.193.204.10’s password:
Last login: Mon Nov 24 20:45:20 2014 from 10.201.232.32
[oracle@vmrh65node1 ~]$ ls
adrci_cleanup.sh backup dgdemo02pmmkvl_1_1.bak grid initdgdemo.ora mapper_behind.sh mapper_checks.sh oradiag_oracle patch working1.sh
[oracle@vmrh65node1 ~]$ ps -ef | grep pmon
grid 2992 1 0 18:28 ? 00:00:00 asm_pmon_+ASM1
oracle 3453 3410 0 18:28 pts/0 00:00:00 grep pmon
[oracle@vmrh65node1 ~]$ ls
adrci_cleanup.sh backup dgdemo02pmmkvl_1_1.bak grid initdgdemo.ora mapper_behind.sh mapper_checks.sh oradiag_oracle patch working1.sh
[oracle@vmrh65node1 ~]$ ps -ef | grep pmon
grid 2992 1 0 18:28 ? 00:00:00 asm_pmon_+ASM1
oracle 3484 3410 0 18:28 pts/0 00:00:00 grep pmon
[oracle@vmrh65node1 ~]$ ps -ef | grep pmon
grid 2992 1 0 18:28 ? 00:00:00 asm_pmon_+ASM1
oracle 3486 3410 0 18:28 pts/0 00:00:00 grep pmon
[oracle@vmrh65node1 ~]$ ps -ef | grep pmon
grid 2992 1 0 18:28 ? 00:00:00 asm_pmon_+ASM1
oracle 3488 3410 0 18:28 pts/0 00:00:00 grep pmon
[oracle@vmrh65node1 ~]$ ps -ef | grep pmon
grid 2992 1 0 18:28 ? 00:00:00 asm_pmon_+ASM1
oracle 3502 3410 0 18:28 pts/0 00:00:00 grep pmon
[oracle@vmrh65node1 diag]$ ps -ef | grep pmon
grid 2992 1 0 18:28 ? 00:00:00 asm_pmon_+ASM1
oracle 3504 1 0 18:29 ? 00:00:00 ora_pmon_o11rhawd1
oracle 3618 3410 0 18:29 pts/0 00:00:00 grep pmon

Ok, so the database came back up. What does the file system look like now?

[oracle@vmrh65node1 diag]$ ls -la /dev/oracleafd/disks
total 20
drwxrwx— 2 grid dba 140 Nov 25 18:27 .
drwxrwx— 3 grid dba 80 Nov 25 18:27 ..
-rw-r–r– 1 root root 10 Nov 25 18:27 DATA
-rw-r–r– 1 root root 10 Nov 25 18:27 OCR1
-rw-r–r– 1 root root 10 Nov 25 18:27 OCR2
-rw-r–r– 1 root root 10 Nov 25 18:27 OCR3
-rw-r–r– 1 root root 10 Nov 25 18:27 RECO

[oracle@vmrh65node1 diag]$ su – grid
Password:
[grid@vmrh65node1 ~]$ . oraenv
ORACLE_SID = [grid] ? +ASM1
The Oracle base has been set to /u01/app/oracle

[grid@vmrh65node1 ~]$ asmcmd afd_lsdsk
——————————————————————————–
Label Filtering Path
================================================================================
OCR1 ENABLED /dev/sdc1
OCR2 ENABLED /dev/sdd1
OCR3 ENABLED /dev/sde1
DATA ENABLED /dev/sdf1
RECO ENABLED /dev/sdg1

Ok, so now everything is back the way it was. AFD is living up to its claim of protecting the data files. This is actually very nice, I have definitely seen times when the ASM disks were accidently overwritten by inexperienced users with too much access.

Enabling ASM Filter Driver

November 25, 2014

As promised, this entry documents my experience in enabling the ASM Filter Driver (ASMFD). The ASMFD is the replace for ASM that according to Oracle prevents non-oracle software from making modifications to Oracle database files. Upgrading from standard ASM was surprisingly difficult, and required much more time than expected. Most of the problems arose from the difficulty in ensuring that all of the original ASM software was disabled during the process of moving to ASMFD. Also causing issues was the requirement to periodically run commands as root or grid, depending on the command. It is very important to run the commands as the correct user id.

Note that the example here is the more complex ASM case of the OCR and Voting Disks being stored in ASM.

So, here is the process.
1. Install the patch for Oracle bug 19035573. This fixes several issues with the upgrade process. Note, the readme for this patch installation is incomplete. Review the Oracle notes for patch installation using opatch prior to installing the patch. I required several tries to get this to work right. In particular, you must run the unlock perl script.
in the Grid Infrastructure home prior to patching, and the post patch script when complete.

2. Identify the OS path of all of the Oracle ASM disks. Oracle Support supplied me with this script which extracted the path for my Oracle disks. Note that this example was on a VM, and I was using shared VMDK’s for this process. Also note that ASM was already properly configured, and the cluster was up and running when this process was started (root user).

/etc/init.d/oracleasm querydisk -d `/etc/init.d/oracleasm listdisks -d` | 
cut -f2,10,11 -d" " | perl -pe 's/"(.*)".*\[(.*), *(.*)\]/$1 $2 $3/g;' | 
while read v_asmdisk v_minor v_major 
do 
v_device=`ls -la /dev | grep " $v_minor, *$v_major " | awk '{print $10}'` 
echo "ASM disk $v_asmdisk based on /dev/$v_device [$v_minor, $v_major]" 
done

Output from above:
ASM disk DATA based on /dev/sdf1 [8, 81]
ASM disk OCR1 based on /dev/sdc1 [8, 33]
ASM disk OCR2 based on /dev/sdd1 [8, 49]
ASM disk OCR3 based on /dev/sde1 [8, 65]
ASM disk RECO based on /dev/sdg1 [8, 97]

3. Add AFD:* to the asm_diskstring so that the asm_diskstring contains both the old and new path NOTE: This is run as the GI home owner (GRID):

$ORACLE_HOME/bin/asmcmd dsset ‘ORCL:*’,’AFD:*’

4. Stop the cluster (run as root):
crsctl stop crs –f (both nodes)

5. Stop asm (as root):
/etc/init.d/oracleasm stop
/etc/init.d/oracleasm disable
Move the file /opt/oracle/extapi/64/asm/orcl/1/libasm.so to a new location so it is out of your path.

6. Stop ACFS (as root):
$ORACLE_HOME/bin/acfsload stop

7. At this point, if not already installed, oracle 12.1 GI should be installed.

8. On both nodes, configure the afd driver:
$ORACLE_HOME/bin/asmcmd afd_configure

9. Label the ASM disks for AFD (as root).
afd_label OCR1 /dev/sdc1 –migrate
afd_label OCR2 /dev/sdd1 –migrate
afd_label OCR3 /dev/sde1 –migrate
afd_label DATA /dev/sdf1 –migrate
afd_label RECO /dev/sdg1 –migrate

10. Scan for the new disks (as GRID):
$ORACLE_HOME/bin/asmcmd dsget
$ORACLE_HOME/bin/asmcmd afd_scan

11. Check the directory /dev/oracleafd/disks/* and verify that the ASMFD disks are listed there.

12. Start the cluster. If possible, boot the server to verify that everything starts properly after boot.

A couple of notes on this process: It took me several days to get this right, and I had to open an SR with Oracle support along the way. Evidently it is common for the ASM drivers to remain loaded, as well as the ACFS drivers. When you get error messages about ACFS or ASM being loaded, you may need to take additional steps to make sure that they are unloaded.
Also, the instructions from Oracle support said to set the afd diskstring to /dev/sd* as the last step before the scan. This may or may not be necessary, I did not run the command, and the upgrade process ran properly. I suspect when using the ASM_DISKSTRING as ORCL:* and the AFD_DISKSTRING as AFD:* this is not needed.

Oracle Trace/Log File Cleanup using ADRCI

October 16, 2014

Due to time constraints, I have not completed my intended entry on the ASM Filter Driver (ASMFD). In the interim, I have written a short post on trace/log file clean up using the latest Oracle utility, ADRCI.

I recently posted a blog about trace/log file clean up in Oracle. The methodology I gave would work for all Oracle versions. However, starting with Oracle 11gR1, Oracle has released a program called adrci that can be used to clean up the log and trace files much more easily. The usage is similar to the OS supplied log file rotation utility logrotate, except it also cleans up trace files. Adrci does not however clean up audit files or the listener log files.

ADRCI stands for the Automatic Diagnostic Repository Command Interpreter. As the name implies, there is evidently much more to this utility than just the ability to clean up old trace files and trim log files. That will be the subject of another post. This post discusses cleaning up log and trace files, which is a very common problem on Oracle servers.

Starting in Oracle 11gR1, the Oracle approved method of removing trace files and cleaning database alert logs is to use adrci, but OS commands must still be used for the listener logs and the audit files. I wrote a shell script using a combination of these methods for cleaning up 11gR2 environments.

The assumptions: the code assumes that the instances are up and running, and that there are entries for the instances in the oratab. In a clustered environment, it is expected that the instance (1, 2, etc) have an entry in the oratab as well as the database name. eg, if there are two cluster nodes for a database named orcl, in node 1 there would be entries for both orcl and orcl1 in the oratab on node one, and in the oratab on node two we would see orcl and orcl2.

Next, we write the script. The first segment is basically bookkeeping, but note that we have to translate the number of days supplied in the first command line argument to minutes in the script, as adrci will expect minutes as the argument on the retention time. The second command line argument is the retention time for the listener log files. The script could easily be modified to accept additional retention requirements based on your specific needs:

# Description
# Shell script to clean up oracle diagnostic files using ADRCI, and to remove log data.
# Allows 2 arguments, $1 – retention time for trace and audit data, $2 – retention time for listener
# log files. Could be enhanced for multiple retention periods. 
###
#set -x
PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/oracle/bin
export PATH
if [[ -n "$1" ]]; then
 if [ $1 -ne 0 -o $1 -eq 0 2>/dev/null ]
 then
   if [[ $1 -lt 0 ]]; then
     echo invalid input
     exit 1
   else
     days=$1
     minutes=$((1440 * $days))
    echo days=$days
     echo minutes=$minutes
   fi
 fi
else
 echo days=7
 days=7
 minutes=$((1440 * $days))
 echo days=$days
 echo minutes=$minutes
fi
if [[ -n "$2" ]]; then
 if [ $2 -ne 0 -o $2 -eq 0 2>/dev/null ]
 then
   if [[ $2 -lt 0 ]]; then
     echo invalid input
     exit 1
   else
     log_days=$1
     echo log_days=$days
   fi
 fi
else
 echo log_days=30
 log_days=30
 echo log_days=$log_days
fi
SERVER=`hostname -s`
FDATE=`date +%d_%m_%y`
# Check user is oracle
USERID=`/usr/bin/id -u -nr`
if [ $? -ne 0 ]
then
       echo "ERROR: unable to determine uid"
       exit 99
fi
if [ "${USERID}" != "oracle" ]
then
       echo "ERROR: This script must be run as oracle"
       exit 98
fi
echo "INFO: Purge started at `date`"

Next, we use the ps command to identify the running oracle instances and loop through them. We grep for the smon process for each running instance, and remove everything up to and including the underscore in the process name to get the database sid. After extracting the sid, we set the environment for that ORACLE_SID using oraenv:

for ORACLE_SID in `ps -e -o "cmd" | grep smon|grep -v grep| awk -F "_" '{print$3}'`
do
   # uncomment 2 lines below if RAC environment and individual sids are not in oratab
   #   SID=`echo $ORACLE_SID | sed -e 's/1//g'`
   #   ORACLE_HOME=`cat /etc/oratab|grep ^$SID:| head -n 1 | cut -f2 -d':'`
 ORAENV_ASK=NO
 export ORAENV_ASK
 export ORACLE_SID
 echo $ORACLE_SID
 . /usr/local/bin/oraenv
 echo SID=$ORACLE_SID
 echo "INFO: purge started at `date`"

Next, we identify the location of the audit files and clean them up. This section is identical to my previous clean up script as adrci will not clean up audit files. Note, it is possible that your environment requires that you move these elsewhere for auditing purposes rather than remove them. These are not the standard Oracle database auditing, these are the records of everything done on the instance using the sys privilege:

 
AUDIT_DEST=`$ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<!
 set feedback off heading off verify off
 select value from v\\$parameter where name='audit_file_dest';
!`
 finaud=`echo $AUDIT_DEST | sed -e 's|\?|'"$ORACLE_HOME"'|'`
 /usr/bin/find $finaud -name \*.aud -mtime +$days
 /usr/bin/find $finaud -name *.aud -mtime +$days | xargs -i ksh -c "echo deleting {}; rm {}"
 echo $finaud
 # Purge ADR contents

Next we run the adrci command to delete the diagnostic information. In the current script, the retention is the same for all file types. It would be fairly easy to modify the script to accept multiple retention levels based on the type of data.

 echo "INFO: adrci purge started at `date`"
 adrci exec="show homes"|grep -v : | while read file_line
 do
   echo "INFO: adrci purging diagnostic destination" $file_line
   echo "INFO: purging ALERT older than $1 days."
   adrci exec="set homepath $file_line;purge -age $minutes -type ALERT"
   echo "INFO: purging INCIDENT older than $1 days."
   adrci exec="set homepath $file_line;purge -age $minutes -type INCIDENT"
   echo "INFO: purging TRACE older than $1 days."
   adrci exec="set homepath $file_line;purge -age $minutes -type TRACE"
   echo "INFO: purging CDUMP older than $1 days."
   adrci exec="set homepath $file_line;purge -age $minutes -type CDUMP"
   echo "INFO: purging HM older than $1 days."
   adrci exec="set homepath $file_line;purge -age $minutes -type HM"
   echo ""
  echo ""
 done
done
echo
echo "INFO: adrci purge finished at `date`"

Finally, we trim the listener log since this not done properly through adrci:

# for whatever reason, adrci doesn't like to remove the listener trace
# log, so we need to get it manually
for listener_log in `find $ORACLE_BASE/diag/tnslsnr -name "listener.log"`
do
 listener_file=`echo "$listener_log" | awk -Ftrace/ '{print $2}'`
 echo $listener_log
 echo $listener_file
 fname="${listener_log}_`date '+%Y%m%d'`.gz"
 fname1="${listener_log}_`date '+%Y%m%d'`"
 echo $fname
 if [ -e $fname ]
 then
   echo "Already cleared $listener_log today"
 else
   cp $listener_log $fname1
   gzip $fname1
   /usr/bin/find $ORACLE_BASE/diag/tnslsnr -name ${listener_file}*.gz -mtime +$log_days | xargs -i ksh -c "echo deleting {}; rm {}"
   echo > $listener_log
 fi
done
echo "SUCC: Purge completed successfully at `date`"
exit 0

Thus we have purged oracle log, trace, and audit files. Below is the script unedited with the commentary removed:


# Description
# Shell script to clean up oracle diagnostic files using ADRCI, and to remove log data.
# Allows 2 arguments, $1 – retention time for trace and audit data, $2 – retention time for listener
# log files. Could be enhanced for multiple retention periods.
###
#set -x
PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/oracle/bin
export PATH
if [[ -n "$1" ]]; then
 if [ $1 -ne 0 -o $1 -eq 0 2>/dev/null ]
 then
   if [[ $1 -lt 0 ]]; then
     echo invalid input
     exit 1
   else
     days=$1
     minutes=$((1440 * $days))
     echo days=$days
     echo minutes=$minutes
   fi
 fi
else
 echo days=7
 days=7
 minutes=$((1440 * $days))
 echo days=$days
 echo minutes=$minutes
fi

if [[ -n "$2" ]]; then
 if [ $2 -ne 0 -o $2 -eq 0 2>/dev/null ]
 then
   if [[ $2 -lt 0 ]]; then
     echo invalid input
     exit 1
   else
     log_days=$1
     echo log_days=$days
   fi
 fi
else
 echo log_days=30
 log_days=30
 echo log_days=$log_days
fi
SERVER=`hostname -s`
FDATE=`date +%d_%m_%y`

# Check user is oracle
USERID=`/usr/bin/id -u -nr`
if [ $? -ne 0 ]
then
       echo "ERROR: unable to determine uid"
       exit 99
fi
if [ "${USERID}" != "oracle" ]
then
       echo "ERROR: This script must be run as oracle"
       exit 98
fi
echo "INFO: Purge started at `date`"
# Establish some oracle enviroment
for ORACLE_SID in `ps -e -o "cmd" | grep smon|grep -v grep| awk -F "_" '{print$3}'`
do
   # uncomment 2 lines below if RAC environment and individual sids are not in oratab
   #   SID=`echo $ORACLE_SID | sed -e 's/1//g'`
   #   ORACLE_HOME=`cat /etc/oratab|grep ^$SID:| head -n 1 | cut -f2 -d':'`
 ORAENV_ASK=NO
 export ORAENV_ASK
 export ORACLE_SID
 echo $ORACLE_SID
 . /usr/local/bin/oraenv
 echo SID=$ORACLE_SID
 AUDIT_DEST=`$ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<!
 set feedback off heading off verify off
 select value from v\\$parameter where name='audit_file_dest';
!`
 finaud=`echo $AUDIT_DEST | sed -e 's|\?|'"$ORACLE_HOME"'|'`
 /usr/bin/find $finaud -name \*.aud -mtime +$days
 /usr/bin/find $finaud -name *.aud -mtime +$days | xargs -i ksh -c "echo deleting {}; rm {}"

 echo $finaud
 # Purge ADR contents
 echo "INFO: adrci purge started at `date`"
 adrci exec="show homes"|grep -v : | while read file_line
 do
   echo "INFO: adrci purging diagnostic destination" $file_line
   echo "INFO: purging ALERT older than $1 days."
   adrci exec="set homepath $file_line;purge -age $minutes -type ALERT"
   echo "INFO: purging INCIDENT older than $1 days."
   adrci exec="set homepath $file_line;purge -age $minutes -type INCIDENT"
  echo "INFO: purging TRACE older than $1 days."
   adrci exec="set homepath $file_line;purge -age $minutes -type TRACE"
   echo "INFO: purging CDUMP older than $1 days."
   adrci exec="set homepath $file_line;purge -age $minutes -type CDUMP"
   echo "INFO: purging HM older than $1 days."
   adrci exec="set homepath $file_line;purge -age $minutes -type HM"
   echo ""
   echo ""
 done
done
echo
echo "INFO: adrci purge finished at `date`"
# All completed
# for whatever reason, adrci doesn't like to remove the listener trace
# log, so we need to get it manually
for listener_log in `find $ORACLE_BASE/diag/tnslsnr -name "listener.log"`
do
 listener_file=`echo "$listener_log" | awk -Ftrace/ '{print $2}'`
 echo $listener_log
 echo $listener_file
 fname="${listener_log}_`date '+%Y%m%d'`.gz"
 fname1="${listener_log}_`date '+%Y%m%d'`"
 echo $fname
 if [ -e $fname ]
 then
   echo "Already cleared $listener_log today"
 else
   cp $listener_log $fname1
   gzip $fname1
   /usr/bin/find $ORACLE_BASE/diag/tnslsnr -name ${listener_file}*.gz -mtime +$log_days | xargs -i ksh -c "echo deleting {}; rm {}"
   echo > $listener_log
 fi
done
echo "SUCC: Purge completed successfully at `date`"
exit 0

ASMLIB in OEL 7

August 18, 2014

In my previous port I wrote about using UDEV rules for device name persistence in an Oracle RAC environment. In this post, I will discuss using ASMLIB to the same purpose. I had intended to also talk about using kmod-oracleasm, which is the equivalent pack to ASMLIB, but unique to RHEL. However, due to licensing issues I was not able to create a set of RHEL VMware workstations to this purpose

AMLIB is the Oracle preferred solution to device name persistence. ASMLIB will write an ASM header to each physical device, as opposed to UDEV, which uses the existing SCSI unique ID to map the device name. In my opinion, ASMLIB is superior to UDEV for device name persistence.

In this post, I will also include the steps to get the iSCSI devices mapped into the OEL operating system. They were not included in my previous posts for the purpose of brevity. These steps are identical to those required in the previous post on UDEV rules. These are primarily included for those who wish to set up their own home cluster. For a true commercial level cluster, use the licensed version of the iscsi target, or other commercial software. For instructions on how to set up the ISCSI target within Openfiler, there are several good sets of instructions available, one is here: http://techhead.co/how-to-configure-openfiler-v23-iscsi-storage-for-use-with-vmware-esx/.

The first step is to turn on the iSCSI initiator on the operating system. The initiator appears to be part of the default kernel installation. The commands to activate it are here:

service iscsid start

chkconfig iscsid on

 

In OEL 7, the commands will be translated to systemctl, so the commands you see will be this:

[root@oel7641 ~]# service iscsid start

Redirecting to /bin/systemctl start iscsid.service

[root@oel7641 ~]# chkconfig iscsid on

Note: Forwarding request to ‘systemctl enable iscsid.service’.

[root@oel7641 ~]#

 

In order to map the openfiler iSCSI devices to the Linux operating system, we use the sendtargets option of the iscsiadm command, then we use the login option to login to the devices. After this is done, the iscsi devices will appear as standard Linux scsi devices:

[root@oel7641-localdomain ~]# iscsiadm -m discovery -t sendtargets -p 192.168.213.100:3260

192.168.213.100:3260,1 iqn.2006-01.com.openfiler:tsn.70d7d9301f39

[root@oel7641-localdomain ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:tsn.70d7d9301f39 -p 192.168.213.100:3260 –login

Logging in to [iface: default, target: iqn.2006-01.com.openfiler:tsn.70d7d9301f39, portal: 192.168.213.100,3260] (multiple)

Login to [iface: default, target: iqn.2006-01.com.openfiler:tsn.70d7d9301f39, portal: 192.168.213.100,3260] successful.

 

In the output below, devices sdb through sdg are the iscsi devices:

 

[root@oel7641 ~]# fdisk -l

 

Disk /dev/sda: 161.1 GB, 161061273600 bytes, 314572800 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk label type: dos

Disk identifier: 0x0003b5aa

 

Device Boot Start End Blocks Id System

/dev/sda1 * 2048 1026047 512000 83 Linux

/dev/sda2 1026048 314572799 156773376 8e Linux LVM

 

Disk /dev/mapper/ol_oel764-swap: 16.9 GB, 16852713472 bytes, 32915456 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

 

Disk /dev/mapper/ol_oel764-root: 87.2 GB, 87241523200 bytes, 170393600 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

 

Disk /dev/mapper/ol_oel764-home: 56.4 GB, 56438554624 bytes, 110231552 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

 

Disk /dev/sdb: 8187 MB, 8187281408 bytes, 15990784 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk label type: dos

Disk identifier: 0x2adfadeb

 

Disk /dev/sdc: 8187 MB, 8187281408 bytes, 15990784 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk label type: dos

Disk identifier: 0x536c247a

 

Disk /dev/sdd: 8187 MB, 8187281408 bytes, 15990784 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk label type: dos

Disk identifier: 0xcf43d9a8

 

Disk /dev/sde: 40.9 GB, 40936407040 bytes, 79953920 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk label type: dos

Disk identifier: 0x253975b8

 

Disk /dev/sdf: 40.9 GB, 40936407040 bytes, 79953920 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk label type: dos

Disk identifier: 0xbdab400b

 

Disk /dev/sdg: 40.9 GB, 40936407040 bytes, 79953920 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk label type: dos

Disk identifier: 0x85ddc3a3

 

At this point, we want to make sure that the devices are also available on the second node. The simplest method to do this is to copy the subdirectories under /var/lib/iscsi to the second and additional nodes in the cluster, then run the commands to start iscsi on those nodes.

scp –r /var/lib/iscsi oel7642:/var/lib/scsi

(log into oel7642)

[root@oel7642 ~]# service iscsid start

Redirecting to /bin/systemctl start iscsid.service

[root@oel7642 ~]# chkconfig iscsid on

Note: Forwarding request to ‘systemctl enable iscsid.service’.

[root@oel7642 ~]#exit

 

Next, we need to partition the iscsi devices. Also, since these are VMware vmdk’s, we need to make sure that the partitions are block aligned for best performance. Here is a typical fdisk conversation to accomplish that purpose. Note that we only partition and run oracleasm on a single node. After the devices are marked for ASM, we just scan for them on the second node and additional nodes:

 

[root@oel7641 ~]# fdisk /dev/sdb

Welcome to fdisk (util-linux 2.23.2).

 

Changes will remain in memory only, until you decide to write them.

Be careful before using the write command.

 

Device does not contain a recognized partition table

Building a new DOS disklabel with disk identifier 0x2adfadeb.

 

Command (m for help): n

Partition type:

p primary (0 primary, 0 extended, 4 free)

e extended

Select (default p): p

Partition number (1-4, default 1): 1

First sector (2048-15990783, default 2048):

Using default value 2048

Last sector, +sectors or +size{K,M,G} (2048-15990783, default 15990783):

Using default value 15990783

Partition 1 of type Linux and of size 7.6 GiB is set

 

Command (m for help): t

Selected partition 1

Hex code (type L to list all codes): fb

Changed type of partition ‘Linux’ to ‘VMware VMFS’

 

Command (m for help): x

 

Expert command (m for help): b

Selected partition 1

New beginning of data (1-15990783, default 2048): 2048

 

Expert command (m for help): w

The partition table has been altered!

 

Calling ioctl() to re-read partition table.

Syncing disks.

 

Because of the number of devices, I created a very simple script to run the fdisk conversation on the remainder of the devices, I called it doit.sh. Don’t complain about the naming. This script assumes we want to use the entire device for ASM storage.

Here is the script:

[root@oel7641 ~]# cat doit.sh

#!/bin/sh
fdisk $1 <<EOF
n
p
1

 
 t
fb
x
b
2048
w
exit
EOF

 

The script is called with the script name followed by the device name:

[root@oel7641 ~]# ./doit.sh /dev/sdc

[root@oel7641 ~]# ./doit.sh /dev/sdd

[root@oel7641 ~]# ./doit.sh /dev/sde

[root@oel7641 ~]# ./doit.sh /dev/sdf

[root@oel7641 ~]# ./doit.sh /dev/sdg

 

At that point, all the device are partitioned and it is time to create the ASM devices.

 

So next, we need to install ASM. It is not installed by default (currently) in the OEL7 Kernel. We use yum to install ORACLEASM, two packages are required ORACLEASM and ORACLEASM-SUPPORT:

 

[root@oel7641-localdomain ~]# yum install oracleasm

Loaded plugins: langpacks, rhnplugin

This system is receiving updates from ULN.

ol7_x86_64_Dtrace_userspace | 1.2 kB 00:00:00

ol7_x86_64_UEKR3 | 1.2 kB 00:00:00

ol7_x86_64_addons | 1.2 kB 00:00:00

ol7_x86_64_latest | 1.4 kB 00:00:00

ol7_x86_64_latest/updateinfo | 33 kB 00:00:00

ol7_x86_64_latest/primary | 4.5 MB 00:00:02

ol7_x86_64_latest 6631/6631

ol7_x86_64_optional_latest | 1.2 kB 00:00:00

ol7_x86_64_optional_latest/updateinfo | 32 kB 00:00:00

ol7_x86_64_optional_latest/primary | 1.4 MB 00:00:00

ol7_x86_64_optional_latest 5022/5022

Resolving Dependencies

–> Running transaction check

—> Package kernel-uek.x86_64 0:3.8.13-35.3.4.el7uek will be installed

–> Processing Dependency: kernel-firmware = 3.8.13-35.3.4.el7uek for package: kernel-uek-3.8.13-35.3.4.el7uek.x86_64

–> Running transaction check

—> Package kernel-uek-firmware.noarch 0:3.8.13-35.3.4.el7uek will be installed

–> Finished Dependency Resolution

 

Dependencies Resolved

 

================================================================================================================================================================

Package Arch Version Repository Size

================================================================================================================================================================

Installing:

kernel-uek x86_64 3.8.13-35.3.4.el7uek ol7_x86_64_UEKR3 31 M

kernel-uek-firmware noarch 3.8.13-35.3.4.el7uek ol7_x86_64_UEKR3 1.8 M

 

Transaction Summary

================================================================================================================================================================

Install 2 Packages

 

Total size: 33 M

Total download size: 1.8 M

Installed size: 112 M

Is this ok [y/d/N]: y

Downloading packages:

No Presto metadata available for ol7_x86_64_UEKR3

kernel-uek-firmware-3.8.13-35.3.4.el7uek.noarch.rpm | 1.8 MB 00:00:01

Running transaction check

Running transaction test

Transaction test succeeded

Running transaction

Installing : kernel-uek-firmware-3.8.13-35.3.4.el7uek.noarch 1/2

Installing : kernel-uek-3.8.13-35.3.4.el7uek.x86_64 2/2

Verifying : kernel-uek-firmware-3.8.13-35.3.4.el7uek.noarch 1/2

Verifying : kernel-uek-3.8.13-35.3.4.el7uek.x86_64 2/2

 

Installed:

kernel-uek.x86_64 0:3.8.13-35.3.4.el7uek kernel-uek-firmware.noarch 0:3.8.13-35.3.4.el7uek

 

Complete!

 

[root@oel7641-localdomain ~]# yum install oracleasm-support

Loaded plugins: langpacks, rhnplugin

This system is receiving updates from ULN.

Resolving Dependencies

–> Running transaction check

—> Package oracleasm-support.x86_64 0:2.1.8-3.el7 will be installed

–> Finished Dependency Resolution

 

Dependencies Resolved

 

================================================================================================================================================================

Package Arch Version Repository Size

================================================================================================================================================================

Installing:

oracleasm-support x86_64 2.1.8-3.el7 ol7_x86_64_latest 79 k

 

Transaction Summary

================================================================================================================================================================

Install 1 Package

 

Total download size: 79 k

Installed size: 242 k

Is this ok [y/d/N]: y

Downloading packages:

oracleasm-support-2.1.8-3.el7.x86_64.rpm | 79 kB 00:00:00

Running transaction check

Running transaction test

Transaction test succeeded

Running transaction

Installing : oracleasm-support-2.1.8-3.el7.x86_64 1/1

Note: Forwarding request to ‘systemctl enable oracleasm.service’.

ln -s ‘/usr/lib/systemd/system/oracleasm.service’ ‘/etc/systemd/system/multi-user.target.wants/oracleasm.service’

Verifying : oracleasm-support-2.1.8-3.el7.x86_64 1/1

 

Installed:

oracleasm-support.x86_64 0:2.1.8-3.el7

 

Complete!

 

 

Once the ASMLIB is installed, it must be configured. This just amounts to telling Linux the owner of the ASM disks, and the location of the disks, along with some minor options.

 

[root@oel7641-localdomain ~]# /etc/init.d/oracleasm configure

Configuring the Oracle ASM library driver.

 

This will configure the on-boot properties of the Oracle ASM library

driver. The following questions will determine whether the driver is

loaded on boot and what permissions it will have. The current values

will be shown in brackets (‘[]’). Hitting <ENTER> without typing an

answer will keep that current value. Ctrl-C will abort.

 

Default user to own the driver interface []: oracle

Default group to own the driver interface []: dba

Scan for Oracle ASM disks on boot (y/n) [y]:

Writing Oracle ASM library driver configuration: done

Initializing the Oracle ASMLib driver: [ OK ]

Scanning the system for Oracle ASMLib disks: [ OK ]

 

With ASM configured, we mark the disks for ASM. Note that unlike UDEV this puts an actual label on the disk, it does not use the existing UUID to identify the disk.

 

[root@oel7641 ~]# /etc/init.d/oracleasm createdisk asmdisk1 /dev/sdb1

Marking disk “asmdisk1″ as an ASM disk: [ OK ]

[root@oel7641 ~]# /etc/init.d/oracleasm createdisk asmdisk2 /dev/sdc1

Marking “asmdisk2″ as an ASM disk:

[root@oel7641 ~]# /etc/init.d/oracleasm createdisk asmdisk3 /dev/sdd1

Marking disk “asmdisk3″ as an ASM disk: [ OK ]

[root@oel7641 ~]# /etc/init.d/oracleasm createdisk asmdisk4 /dev/sde1

Marking disk “asmdisk4″ as an ASM disk: [ OK ]

[root@oel7641 ~]# /etc/init.d/oracleasm createdisk asmdisk5 /dev/sdf1

Marking disk “asmdisk5″ as an ASM disk: [ OK ]

[root@oel7641 ~]# /etc/init.d/oracleasm createdisk asmdisk6 /dev/sdg1

Marking disk “asmdisk6″ as an ASM disk: [ OK ]

[root@oel7641 ~]# /etc/init.d/oracleasm listdisks

ASMDISK1

ASMDISK2

ASMDISK3

ASMDISK4

ASMDISK5

ASMDISK6

 

Next, we make sure that the other cluster nodes can see the ASM disks. We do not need to reboot the second node of the cluster, nor do we run the createdisk command on the other nodes in the cluster, we scan for the storage devices on the other nodes, to make oracleasm aware of the disks and we are done:

 

[root@oel7642 ~]# /etc/init.d/oracleasm scandisks

Scanning the system for Oracle ASMLib disks: [ OK ]

[root@oel7642 ~]# /etc/init.d/oracleasm listdisks

ASMDISK1

ASMDISK2

ASMDISK3

ASMDISK4

ASMDISK5

ASMDISK6

[root@oel7642 ~]#

 

If you want to verify that the disks appear properly reboot both servers and check.

 

At this point, the Oracle Grid Infrastructure for a cluster can be installed. The ASM_DISKSTRING for the installation will be /dev/oracleasm/disks/*.

 

According to Oracle documentation, starting with Oracle version 12, there is a piece of software called the ASM Filter Driver (ASMFD) that prevents non-oracle software from writing to Oracle data files. I assume that by this they mean software not owned by the Oracle software owner. In any case, the ASMFD replaces ASMLIB. In my next post, I will go over the steps to replace ASMLIB with ASMFD.

 

 

 

Udev Rules

August 8, 2014

Being a true computer nerd, periodically I feel the need to learn a new application or skill. Most recently, I had some discussions with other nerds about the advantages of uses Linux’ native udev rules rather than Oracle’s ASMLIB for device persistence on Linux/Unix servers.

First a quick history of ASMLIB. Originally available on RHEL 4 and 5, ASMLIB became unavailable on RHEL 6 because Oracle stopped maintaining it for RHEL. After much negative publicity about this, Oracle made the kmod-oracleasm package available for RHEL, which replaced Oracle ASMLIB with about the same functionality, and only required a couple of packages from the Public OEL Yum servers to make it work properly (oracleasm and oracleasm-support). Even so, there are still discussions, often tantamount to religious wars, with regard to the advantages of ASMLIB versus udev.

As a consequence, I felt it would be a good idea for me to learn how to use udev. Udev is designed to accomplish the same task that ASMLIB does, that is persist device names and privileges across system reboots on Linux servers. Linux has the unfortunate habit of renaming disk devices when you add new ones, eg, /dev/sdc might become /dev/sdd when a new device is added, and the new device becomes /dev/sdc. Udev uses the unique SCSI device id (UUID) to associate a specific storage device with a specific mount point on the Linux server, thus making the Linux name moot. Udev can be used on most, if not all devices on Linux servers, though this article only discusses it usage with storage devices. There are numerous examples of udev rules for use with the Oracle storage. In general, the format (in OEL/RHEL 6.5) is something like this:

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c45524856447a376e2d7871364a2d654b4272″, NAME+=”oracleasm/asm-disk1″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

Where KERNEL reperesents the device interface type (sd is SCSI, hd would be ide, there are others) SUBSYSTEM is the storage type (block) PROGRAM is the command to query the device, RESULT is the return value of the query, and the remainder of the fields show how to use the data returned. In the example above, the device with a name in the pattern SD?1 returning the result “14f504e46494c45524856447a376e2d7871364a2d654b4272″ gets named “/dev/oracleasm/asm-disk1″, with oracle as the owner, dba as the group, and protections are 660. Once the rule set works properly, the file can be duplicated for other nodes when using a cluster. The last two items are the key to the storage udev rules, the long ID number is unique to the storage device, and as a consequence will always be associated with the appropriate name. Thus, when the same set of rules is copied to other nodes on a cluster, each storage device is named correctly and as a consequence, software such as Oracle can find the storage where it is expected.

Udev rule files are also used to name network devices, cd and dvd drives, etc, and are stored in the directory /udev/rules.d. The rules files are always prefixed with a number indicating the order in which they will be executed followed by a name, followed b a suffix .rules. Eg, 99-oracleasm-devices.rules. This is very similar in design to the Linux runlevel startup scripts.

So, as you might have gathered udev rules are a fundamental piece of making Linux work. This really makes me wonder why such a fundamental piece of the Linux OS was changed in what is a fairly dramatic way in the next release of Redhat Linux, RHEL 7. Specifically, the location of the scsi_id program has changed, and the parameter used in the command have changed. The NAME option can no longer be used for Disk Devices, it only works for network devices. So the equivalent of the udev rule above, in RHEL 7 would be this:

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c45524856447a376e2d7871364a2d654b4272″, SYMLINK+=”oracleasm/asm-disk1″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

Note that it now creates a symbolic link for the device. I am not and do not claim to be a Linux System Administrator, but I would have expected to be able to search for ‘RHEL7 Udev rules’ and get some sort of explanation of this change, the expected rule change required, but to this point I have not found anything. The designers of this may have intended to use a different parameter than SYMLINK in the udev rules, but it appears to be the only option at this point.

To make the creation of udev rules a little quicker, I wrote a very simple script to generate the rules:

[root@oel7614 ~]# cat setudev.sh

#!/bin/sh
COUNTER=0
for id in `lsscsi -i 3 | awk -F ” ”  ‘{print$7}’`
do
COUNTER=$((COUNTER+1))
echo KERNEL==\”sd?1\”, SUBSYSTEM==\”block\”, PROGRAM==\”/usr/lib/udev/scsi_id -g -u -d /dev/\$parent\”, RESULT==\”${id}\”, SYMLINK+=\”oracleasm/asm-disk$COUNTER\”, OWNER=\”oracle\”, GROUP=\”dba\”, MODE=\”0660\”
done

 

Please note that the script above, while useful for me, makes some assumptions. One, it assumes that the disks of interest are on the third SCSI channel, which is probably not the case on the readers system. And two, it assumes that the UUID shows up in the seventh column of output. These will vary by device type and storage software. I am using openfiler and vmdk’s for my storage.

In order to see all SCSI devices on your server, run the lsscsi. The output should be similar to below (in larger print I hope), note this is merely an example from my VMware workstation:

[root@oel7614 ~]# lsscsi -i

[root@oel7614 ~]# lsscsi -i

[1:0:0:0] cd/dvd NECVMWar VMware IDE CDR10 1.00 /dev/sr0 -

[2:0:0:0] disk VMware, VMware Virtual S 1.0 /dev/sda 36000c29105108d894764376c6756ebf7

[3:0:0:0] disk OPNFILER VIRTUAL-DISK 0 /dev/sdb 14f504e46494c45524856447a376e2d7871364a2d654b4272

[3:0:0:1] disk OPNFILER VIRTUAL-DISK 0 /dev/sdc 14f504e46494c45524e78676a70372d556d43312d4f706379

[3:0:0:2] disk OPNFILER VIRTUAL-DISK 0 /dev/sdd 14f504e46494c455246305034744e2d786133422d79507066

[3:0:0:3] disk OPNFILER VIRTUAL-DISK 0 /dev/sde 14f504e46494c45524b46664d4c332d5863497a2d58764357

[3:0:0:4] disk OPNFILER VIRTUAL-DISK 0 /dev/sdf 14f504e46494c4552644b424254762d566468522d48355254

[3:0:0:5] disk OPNFILER VIRTUAL-DISK 0 /dev/sdg 14f504e46494c4552657172324e572d4e4676362d576c7944

[3:0:0:6] disk OPNFILER VIRTUAL-DISK 0 /dev/sdh 14f504e46494c45524e4b744332762d795862672d30377676

[3:0:0:7] disk OPNFILER VIRTUAL-DISK 0 /dev/sdi 14f504e46494c455231594d6c4d462d3456776f2d4d523552

[3:0:0:8] disk OPNFILER VIRTUAL-DISK 0 /dev/sdj 14f504e46494c45523452636e62632d4e4177692d51644665

[3:0:0:9] disk OPNFILER VIRTUAL-DISK 0 /dev/sdk 14f504e46494c455269314c68397a2d50646e352d316e4f33

[3:0:0:10] disk OPNFILER VIRTUAL-DISK 0 /dev/sdl 14f504e46494c455233686b36694e2d67514a632d58344e47

[3:0:0:11] disk OPNFILER VIRTUAL-DISK 0 /dev/sdm 14f504e46494c4552686b706e304b2d384e714f2d52586175

 

This is the output of the setudev script:

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c45524856447a376e2d7871364a2d654b4272″, SYMLINK+=”oracleasm/asm-disk1″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c45524e78676a70372d556d43312d4f706379″, SYMLINK+=”oracleasm/asm-disk2″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c455246305034744e2d786133422d79507066″, SYMLINK+=”oracleasm/asm-disk3″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c45524b46664d4c332d5863497a2d58764357″, SYMLINK+=”oracleasm/asm-disk4″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c4552644b424254762d566468522d48355254″, SYMLINK+=”oracleasm/asm-disk5″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c4552657172324e572d4e4676362d576c7944″, SYMLINK+=”oracleasm/asm-disk6″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c45524e4b744332762d795862672d30377676″, SYMLINK+=”oracleasm/asm-disk7″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c455231594d6c4d462d3456776f2d4d523552″, SYMLINK+=”oracleasm/asm-disk8″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c45523452636e62632d4e4177692d51644665″, SYMLINK+=”oracleasm/asm-disk9″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c455269314c68397a2d50646e352d316e4f33″, SYMLINK+=”oracleasm/asm-disk10″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c455233686b36694e2d67514a632d58344e47″, SYMLINK+=”oracleasm/asm-disk11″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c4552686b706e304b2d384e714f2d52586175″, SYMLINK+=”oracleasm/asm-disk12″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

The RHEL/OEL 6.5 version of the script and output are:

[root@oel7614 ~]# cat setudev65.sh

#!/bin/sh

COUNTER=0

for id in `lsscsi -i 3 | awk -F ” ” ‘{print$7}’`

do

COUNTER=$((COUNTER+1))

echo KERNEL==\”sd?1\”, SUBSYSTEM==\”block\”, PROGRAM==\”/sbin/scsi_id -g -u -d /dev/\$parent\”, RESULT==\”${id}\”, NAME+=\”oracleasm/asm-disk$COUNTER\”, OWNER=\”oracle\”, GROUP=\”dba\”, MODE=\”0660\”

done

 

[root@oel7614 ~]# ./setudev65.sh

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c45524856447a376e2d7871364a2d654b4272″, NAME+=”oracleasm/asm-disk1″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c45524e78676a70372d556d43312d4f706379″, NAME+=”oracleasm/asm-disk2″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c455246305034744e2d786133422d79507066″, NAME+=”oracleasm/asm-disk3″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c45524b46664d4c332d5863497a2d58764357″, NAME+=”oracleasm/asm-disk4″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c4552644b424254762d566468522d48355254″, NAME+=”oracleasm/asm-disk5″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c4552657172324e572d4e4676362d576c7944″, NAME+=”oracleasm/asm-disk6″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c45524e4b744332762d795862672d30377676″, NAME+=”oracleasm/asm-disk7″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c455231594d6c4d462d3456776f2d4d523552″, NAME+=”oracleasm/asm-disk8″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c45523452636e62632d4e4177692d51644665″, NAME+=”oracleasm/asm-disk9″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c455269314c68397a2d50646e352d316e4f33″, NAME+=”oracleasm/asm-disk10″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c455233686b36694e2d67514a632d58344e47″, NAME+=”oracleasm/asm-disk11″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

KERNEL==”sd?1″, SUBSYSTEM==”block”, PROGRAM==”/sbin/scsi_id -g -u -d /dev/$parent”, RESULT==”14f504e46494c4552686b706e304b2d384e714f2d52586175″, NAME+=”oracleasm/asm-disk12″, OWNER=”oracle”, GROUP=”dba”, MODE=”0660″

 

Please note: The above information is not comprehensive, the intent here is to give a basic tutorial on the usage of udev for storage on Oracle servers, it will not make you an expert on the subject! If, for example, multipathed devices are used, the udev rules are different. There are many other cases where the rules will be different, this is just a basic introduction.

In my next blog, the subject will be ASMLIB on Redhat using kmod-oracleasm, compared to ASMLIB on OEL, and also I plan to have some information on the changes to ASM that come in with Oracle 12c (ASMFD). That may result in more than one more entry. We will see how it goes.

Enabling Dataguard Broker Observer

June 3, 2014

I recently had occasion to research the use of the Dataguard Broker observer (Observer). The concept behind the Observer is to have a process running on a separate server that observes the Dataguard Broker on the primary and the standby. When there is a problem with the primary, the broker will initiate the failover. Many of the conditions for failover by the Observer are configurable by the user. In this Blog entry, I describe how to enable the Observer, and what happens with the failover process is activated.

The Dataguard Broker Observer (Observer) is used when fast start failover is required. Fast start failover is simply an automatic failover to the standby database. As noted above, this is not often desired as most often even the most urgent failover/switchover scenarios require configuration of the Application once the database has been activated at the DR site. The observer will initiate the failover under the following conditions automatically:
By default, the observer will initiate failover to the target standby if and only if ALL of the following are true:
• observer is running
• observer and the standby both lose contact with the primary

o Note: if the observer loses contact with the primary, but the standby does not, the observer can determine that the primary is still up via the standby.
• observer is still in contact with the standby
• durability constraints are met
• failover threshold timeout has elapsed
In Oracle 11g, the user can configure failover conditions:
User configurable failover conditions (11g and later)
Oracle Database 11g Rel 1 introduced user configurable failover conditions that can trigger the observer to initiate failover immediately.

Health conditions
Broker can be configured to initiate failover on any of the following conditions. Conditions shown in blue are enabled by default.
• Datafile Offline (due to IO errors)
• Corrupted Controlfile
• Corrupted Dictionary
• Inaccessible Logfile (due to IO errors)
• Stuck Archiver

Oracle errors (ORA-NNNNN)
You can also specify a list of ORA- errors that will initiate FSFO failover. The list is empty by default.

Application initiated
Applications can initiate FSFO failover directly using the DBMS_DG.INITIATE_FS_FAILOVER procedure with an optional message text that will be displayed in the observer log and the primary’s alert log.

Below are the steps to enable the Observer and Fast Start failover.
1. Enable flashback database on both the standby and the primary.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 2137886720 bytes
Fixed Size 2254952 bytes
Variable Size 889194392 bytes
Database Buffers 1241513984 bytes
Redo Buffers 4923392 bytes
Database mounted.
SQL> alter database flashback on;

Database altered.

SQL> alter database open;

Database altered.

SQL> exit

2. Ensure that standby redo logs are created on both the primary and the standby. There must be one more standby redo log group than regular redo log group. For RAC databases, there must be one more redo log group per thread, ie if there are three redo log groups for thread one, there must be four standby redo log groups for thread one. The command to create a standby log are ‘alter database add standby logfile thread x group y (‘+DBDATA’,’+FLASHBACKDATA’) size <bytes>;’ Note that the standby logs must be exactly the same size as the normal redo logs. One trick here is to run a switchover and switch back, on the switchover the Dataguard broker will send a message to the alert log if standby redo logs are needed, it will list the exact commands needed to create the appropriate number and size of standby redo logs.

3. Set the logxptmode to SYNC on both primary and standby. Note that this cannot be done without standby redo logs configured.
DGMGRL> edit database ‘primsdr’ set property ‘LogXptMode’=’SYNC';
Property “LogXptMode” updated
DGMGRL> edit database ‘prims’ set property ‘LogXptMode’=’SYNC';
Property “LogXptMode” updated
DGMGRL>

4. If there are two or more standby instances configured, set the property FastStartFailoverTarget to the appropriate standby database. This is allowed if there is a single standby also, so it is a good practice.
DGMGRL> edit database ‘prims’ set property FastStartFailoverTarget=’primsdr';
Property “faststartfailovertarget” updated
DGMGRL>

5. Set the availability mode to maximum availability:
DGMGRL> EDIT CONFIGURATION SET PROTECTION MODE AS MAXAVAILABILITY;
Succeeded.
DGMGRL>

6. Start the observer in detached session. This can be done through cron or other method. Typically the observer is on a separate server, and is started on bootup:
DGMGRL> CONNECT sys@prims;
Password: password
Connected.
DGMGRL> START OBSERVER;
Observer started

7. Enable FAST_START failover and verify the status:
DGMGRL> enable FAST_START failover;
Enabled.
DGMGRL> show FAST_START failover;

Fast-Start Failover: ENABLED

Threshold: 30 seconds
Target: primsdr
Observer: gridserver
Lag Limit: 30 seconds (not in use)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE

Configurable Failover Conditions
Health Conditions:
Corrupted Controlfile YES
Corrupted Dictionary YES
Inaccessible Logfile NO
Stuck Archiver NO
Datafile Offline YES

Oracle Error Conditions:
(none)

DGMGRL>

8. Test the failover scenarios. When the primary aborts due to a failover condition, the observer will attempt to reinstate the old primary as the standby when the standby is mounted and the observer senses the mount. At that point, the observer will use flashback database to flash the database back to the failure point, and instantiate it as the new standby database in preparation for a probably switchover operation.
a. For testing. Failover to standby. This forces the database open resetlogs, disabling standby:
[oracle@gridserver admin]$ dgmgrl sys/password
DGMGRL> failover to primsdr
Performing failover NOW, please wait…
Failover succeeded, new primary is “primsdr”
DGMGRL> exit
[oracle@gridserver admin]$ . oraenv
ORACLE_SID = [primsdr] ?
-bash: [: too many arguments
The Oracle base has been set to
XPointer evaluation failed: no locset
[oracle@gridserver admin]$ dgmgrl sys/password
DGMGRL for Linux: Version 11.2.0.4.0 – 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type “help” for information.
Connected.
DGMGRL> show configuration

Configuration – prims

Protection Mode: MaxAvailability
Databases:
primsdr – Primary database
Warning: ORA-16817: unsynchronized fast-start failover configuration

prims – (*) Physical standby database (disabled)
ORA-16661: the standby database needs to be reinstated

Fast-Start Failover: ENABLED

Configuration Status:
WARNING

DGMGRL> exit

b. Startup and mount the standby, this will initiate the reinstatement:
[oracle@gridserver ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Thu Apr 24 16:29:28 2014

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area 2137886720 bytes
Fixed Size 2254952 bytes
Variable Size 889194392 bytes
Database Buffers 1241513984 bytes
Redo Buffers 4923392 bytes
Database mounted.
SQL> show parameter background

c. Monitor the observer log to verify the reinstantiation:
[oracle@gridserver ~]$ dgmgrl sys/password
DGMGRL for Linux: Version 11.2.0.4.0 – 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type “help” for information.
Connected.
DGMGRL> start observer;
Observer started

16:30:15.97 Thursday, April 24, 2014
Initiating reinstatement for database “prims”…
Reinstating database “prims”, please wait…
Operation requires shutdown of instance “prims” on database “prims”
Shutting down instance “prims”…
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance “prims” on database “prims”
Starting instance “prims”…
ORACLE instance started.
Database mounted.
Continuing to reinstate database “prims” …
Reinstatement of database “prims” succeeded
16:31:44.75 Thursday, April 24, 2014

d. Verify the reinstantiation and switchover:
[oracle@gridserver ~]$ dgmgrl sys/password
DGMGRL for Linux: Version 11.2.0.4.0 – 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type “help” for information.
Connected.
DGMGRL> show configuration verbose

Configuration – prims

Protection Mode: MaxAvailability
Databases:
primsdr – Primary database
prims – (*) Physical standby database

(*) Fast-Start Failover target

Properties:
FastStartFailoverThreshold = ’30’
OperationTimeout = ’30’
FastStartFailoverLagLimit = ’30’
CommunicationTimeout = ‘180’
ObserverReconnect = ‘0’
FastStartFailoverAutoReinstate = ‘TRUE’
FastStartFailoverPmyShutdown = ‘TRUE’
BystandersFollowRoleChange = ‘ALL’
ObserverOverride = ‘FALSE’
ExternalDestination1 = ”
ExternalDestination2 = ”
PrimaryLostWriteAction = ‘CONTINUE’

Fast-Start Failover: ENABLED

Threshold: 30 seconds
Target: prims
Observer: gridserver
Lag Limit: 30 seconds (not in use)
Shutdown Primary: TRUE
Auto-reinstate: TRUE
Observer Reconnect: (none)
Observer Override: FALSE

Configuration Status:
SUCCESS

DGMGRL> switchover to prims
Performing switchover NOW, please wait…
Operation requires a connection to instance “prims” on database “prims”
Connecting to instance “prims”…
Connected.
New primary database “prims” is opening…
Operation requires startup of instance “primsdr” on database “primsdr”
Starting instance “primsdr”…
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is “prims”
DGMGRL>

 

Trace File Cleanup – A common Problem for DBA’s

March 24, 2014

Update for May 6.  I have modified the loop declaration because some unix servers have a different format for the output of ps -ef.  Use this declaration to get only the required output:

for ORACLE_SID in `ps -e -o "cmd" | grep smon|grep -v grep| awk -F "_"  '{print$3}'`
do

 

I have also modified the text below.

==========================================

Recently I have had to do work at a couple of client sites to set up scripts to clean up the many trace and log files that the Oracle database leaves all over the file system. Somewhere in the middle of the process I realized that the whole process would be much easier if I were to read the Oracle database itself to determine the location of the audit, trace, and log files.

So after scanning my memory, I realized that there are probably only two or three parameters that I need to look at to clean up the vast majority of trace files. These include audit_file_dest, background_dump_dest, and in 11gR2 only, diagnostic_dest. In addition, you will need the value for ORACLE_HOME.

Also, since you may have multiple Oracle Homes on a server, you want to make sure that you clean up files for all running instances on the server. So, the first thing I did was set up a loop to identify all the running instances on the server. In the bash shell, a loop to do this can be set up like this:

WORKDIR="$1"
export WORKDIR
ORAENV_ASK='NO'
for ORACLE_SID in `ps -e -o "cmd" | grep smon|grep -v grep| awk -F "_"  '{print$3}'`
do
. oraenv
sqlplus / as sysdba &lt;&lt;EOF
select instance_name from v$instance;
exit;
EOF
&lt;Rest of commands Here&gt;
done

As you can see, we now have a loop set up that will roll us through each running Oracle instance on the server. If you want to exclude ASM, add a grep –v ASM.

From there, we need to extract the location of the trace files and audit file. In order to do this, you write some very short shell scripts, or you can embed the code in your script. The Short shell scripts are below:

#get_audit.ksh
#!/bin/sh
sqlplus -s "/ as sysdba"&lt;&lt;EOF
set feedback off heading off verify off
select value from v\$parameter where name='audit_file_dest';
exit
EOF

#get_diag_base.sh
#!/bin/sh
$ORACLE_HOME/bin/sqlplus -s "/ as sysdba"&lt;&lt;EOF
set feedback off heading off verify off
select value from v\$parameter where name='diagnostic_dest';
exit
EOF

#get_dump.sh
#!/bin/sh
sqlplus -s "/ as sysdba"&lt;&lt;EOF
set feedback off heading off verify off
select value from v\$parameter where name='background_dump_dest';
exit
EOF

The output for these shell scripts can be called from the calling script in this format:

AUDIT_DEST=`$WORKDIR/get_audit.sh`

In Oracle 9i, the audit file destination usually has a question mark in it (indicating Oracle home). This can be replaced using sed, like this:

$AUDIT_DEST | sed -e 's|\?|'"$ORACLE_HOME"'|g'`

Next, we need to figure out whether we have an Oracle 11gR2 type structure (uses diagnostic_dest) or pre-11g structure (uses background_dump_dest). To determine this, we check for the existence of the parameter diagnostic_dest, so after calling get_diag_base, we check for existence, then we use sed once again to clean up the values so that we can use them in a script:

if [[ -d "$DIAG" ]]; then
DIAG_BASE=$DIAG/rdbms/$lsid/$ORACLE_SID
DIAG_BASE=`echo $DUMPNAME | sed -e 's|\/trace||g'`
version=11
else
DIAG_BASE=`echo $DUMPNAME | sed -e 's|\/bdump||g'`
version=10
fi

Next, we need to look at the background_dump_dest, this value is easier to use to get the location of the database trace and log files:

DUMPNAME=`$WORKDIR/get_dump.sh`

So now our script looks like this:

WORKDIR="$1"
export WORKDIR
ORAENV_ASK='NO'
for ORACLE_SID in `ps -ef | grep smon|grep -v grep|awk '{print $8}'| awk -F "_" '{print$3}'`
do
. oraenv
sqlplus / as sysdba &lt;&lt;EOF
select instance_name from v$instance;
exit;
EOF
AUDIT_DEST=`$WORKDIR/get_audit.sh`
DIAG_BASE=`$WORKDIR/get_diag_base.sh`
DUMPNAME=`$WORKDIR/get_dump.sh`
if [[ -d "$DIAG" ]]; then
DIAG_BASE=$DIAG/rdbms/$lsid/$ORACLE_SID
DIAG_BASE=`echo $DUMPNAME | sed -e 's|\/trace||g'`
version=11
else
DIAG_BASE=`echo $DUMPNAME | sed -e 's|\/bdump||g'`
version=10
fi
done

Next, make the diag_base the working directory.

From there, we use the find command to find and delete the files.

Note that at this point you are deleting files. I highly recommend you run the segment below initially without the ‘rm –rf’ section to ensure that you do not delete files you want to keep:

find $DIAG_BASE -name \cdmp* -mtime +$daysback | xargs -i sh -c "echo deleting {}; rm -rf {}"

Now, if you followed this carefully you have everything you need to write a script that will clean up oracle log and trace files. Below is a segment of the script that will remove the .trc files:


#!/bin/sh
#set -x
PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/oracle/bin
export PATH
if [[ -n "$1" ]]; then
if [ $1 -ne 0 -o $1 -eq 0 2&gt;/dev/null ]
then
if [[ $1 -lt 0 ]]; then
echo invalid input
exit 1
else
days=$1
echo days=$days
fi
fi
else
echo days=7
days=7
echo days=$days
fi
SERVER=`hostname -s`
FDATE=`date +%d_%m_%y`
for ORACLE_SID in `ps -ef | grep smon|grep -v grep|awk '{print $8}'| awk -F "_" '{print$3}'`
do
# uncomment 2 lines below if RAC environment and individual sids are not in oratab
# SID=`echo $ORACLE_SID | sed -e 's/1//g'`
# ORACLE_HOME=`cat /etc/oratab|grep ^$SID:| head -n 1 | cut -f2 -d':'`
ORAENV_ASK=NO
export ORAENV_ASK
export ORACLE_SID
echo $ORACLE_SID
. /usr/local/bin/oraenv
echo SID=$ORACLE_SID
AUDIT_DEST=`$ORACLE_HOME/bin/sqlplus -s "/ as sysdba" &lt;&lt;!
set feedback off heading off verify off
select value from v\\$parameter where name='audit_file_dest';
!`
DIAGX=`$ORACLE_HOME/bin/sqlplus -s "/ as sysdba" &lt;&lt;!
set feedback off heading off verify off
select value from v\\$parameter where name='diagnostic_dest';
!`
DIAG=`echo $DIAGX | sed -e 's| *||g'`
DUMPNAME=`$ORACLE_HOME/bin/sqlplus -s "/ as sysdba" &lt;&lt;!
set feedback off heading off verify off
select value from v\\$parameter where name='background_dump_dest';
!`
export DIAG_BASE
finaud=`echo $AUDIT_DEST | sed -e 's|\?|'"$ORACLE_HOME"'|'`
# used to clean up Oracle 9 '?' in patch.
if [[ -d "$DIAG" ]]; then
# echo base=$DIAG
DIAG_BASE=$DIAG/rdbms/$lsid/$ORACLE_SID
DIAG_BASE=`echo $DUMPNAME | sed -e 's|\/trace||g'`
version=11
else
DIAG_BASE=`echo $DUMPNAME | sed -e 's|\/bdump||g'`
version=10
fi
echo base=$DIAG_BASE
echo audit=$finaud
/usr/bin/find $finaud -name \*.aud -mtime $days
/usr/bin/find $finaud -name *.aud -mtime $days | xargs -i ksh -c "echo deleting {}; rm {}"
cd $DIAG_BASE
cd ..
export DIAG_BASE
if [[ -d "$DIAG_BASE" ]]; then
# echo 'diag_base=$DIAG_BASE'
cd $DIAG_BASE
/usr/bin/find $DIAG_BASE -name \*.trc -mtime $days
/usr/bin/find $DIAG_BASE -name \*.trc -mtime $days | xargs -i ksh -c "echo deleting {}; rm {}"
fi
done

Some people will gzip the trace and log files prior to deleting or clearing them.

Setting up Data Guard through OEM

November 7, 2013

One of the common tasks for an Oracle DBA is to set up an existing Oracle instance in Data Guard. Many DBA’s have struggled through the task of setting it up using many different sets of step by step instructions on the internet. However, perhaps the simplest method of setting up Data Guard is to set it up through OEM Grid Control or OEM Cloud Control.

This blog entry contains the step by step instructions for setting it up through OEM Cloud Control.

  1. Create an Oracle Home on the standby server. If the primary database uses ASM, create an ASM instance on the standby server. The version on the standby instance must be the same Oracle version as the Primary.
  2. Make sure that both the Primary instance and the standby Oracle Home and ASM instance (if required) are registered in OEM Grid Control/OEM Cloud Control
  3. In OEM Cloud Control go to the Source database page:

 

 

 

  1. Under the availability, choose ‘Add Standby Database’, then create new physical standby database, then continue.

 

  1. Choose online backup, use recovery manager (defaults), then continue.

  1. Choose the default under backup options, and enter the Primary host credentials, typically these will already be set up.

 

 

  1. Enter the standby location information.

 

 

 

  1. Enter the standby database file location information. If you use a file system, you can customize the file locations. OEM Cloud Control requires you to go to ASM when the source is ASM.

 

  1. Choose the appropriate options for Oracle restart configuration. Its fairly rare that you want automatic Oracle restart. Typically you do not want automatic restart.
  2. I prefer to use Data Guard broker. Also, OEM Cloud Control expects to use Data Guard broker, so it is probably best to use the broker.

  1. Review the information provided, and if all looks right, choose Finish.

  1. At that point, OEM will submit a job to complete the setup of the standby database.

 

You can monitor the job through grid control. Once the job is complete, you will see a Data Guard administration page under the ‘Availability’ tab and there you can check on the status of your standby.

 

 

Remember to go back and set up the standby target in OEM Cloud Control/OEM Grid Control once complete.

Large Pages Implementation in Windows

July 21, 2013

It was pointed out to me that I have not yet covered the steps for configuring hugepages for windows. Hugepages are rarely configured for windows, but it is available, and Oracle does recommending configuring it. Largepage size for windows is 16M. Nwalter Notes in comments below that in at least some windows versions, the registry key ORA_LPSIZE can be used to set the size of the large page granule.

There is a large caveat for largepages on windows. When starting the Oracle instance using hugepages, all of the hugepage memory must be contiguous. This means, for example, if the database, has an sga_target of 20g, there must not just be 20g RAM free, there must be 20g of contiguous RAM free on the server. This may mean that you have to reboot the server every time you want to restart the Oracle database.

So, the steps for largepage/hugepage implementation are as follows:

  1. Give the oracle user the lock pages in memory privilege. It does not have this privilege by default.
    1. Go to start->settings->control panel, double click on Adminstrative Tools.
    2. Double click Local Security Policy
    3. In the left pane, expand Local Policies and choose User Rights Assignment.
    4. In the right pane of the local security policy settings, select Lock Pages In Memory, choose Action->Security
    5. Click add
    6. Select the Oracle user from the name list
    7. Click add
    8. Click ok
    9. Click ok.
  2. Enable large page support for the oracle instances.
    1. Go to the ORACLE_BASE\ORACLE_HOME\bin\oracle. Key in the file system.
    2. Open oracle.key in a text editor and record the value found. This is a value set by the installer, and is normally SOFTWARE\ORACLE\KEY_{oracle_home_name}.
    3. Start the registry editor (regedit).
    4. Browse to HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\{key_home_name from step b}
    5. Create either of the two entries below, the first enables large pages for all instances, the second enables large pages for just the referenced sid:
      1. ORA_LPENABLE=1
      2. ORA_SID_LPENABLE=1 Nick Walter notes in his comments that both of these are string keys. Make sure they are created that way.
    6. Exit the registry Editor.

At this point, large pages are enabled for windows. Remember the caveat, it is possible, even likely that your server will need to be rebooted each time you want to start up the Oracle instances when large pages are enabled.

Update to utl_mail Trigger

June 28, 2013

A year or so ago I posted a trigger for adding the entry to the network ACL when granting privileges on utl_mail (here: http://dbakerber.wordpress.com/2011/06/29/11gr2-network-acl-what-a-nice-feature/ ). For some reason Oracle decided that in Oracle 11, if you grant execute on utl_mail to a user, you must also grant the user access to a network Access Control List if you want the user to be able to actually use utl_mail. I have recently updated the trigger to deal with some design problems, and also written a trigger to remove the access when the permission on utl_mail is revoked. The updated and new trigger are below:

Grant trigger:

/* Formatted on 10/28/2014 7:43:52 AM (QP5 v5.267.14150.38599) */
CREATE OR REPLACE TRIGGER capture_grant_mail
   AFTER GRANT
   ON DATABASE
DECLARE
   smtp_server_name   VARCHAR2 (30) := 'EMPTY_NAME';
   sql_text           ora_name_list_t;
   hosted             NUMBER := 0;
   grantee_list       ora_name_list_t;
   grantee            VARCHAR2 (30);
   stmt               VARCHAR2 (2000);
   n                  NUMBER;
   num_grantees       BINARY_INTEGER;
   added              NUMBER := 0;
   created            NUMBER := 0;
   cluster_database   VARCHAR2 (5) := 'FALSE';
BEGIN
   DBMS_OUTPUT.enable (20000);

   BEGIN
      SELECT VALUE
        INTO smtp_server_name
        FROM v$parameter
       WHERE name = 'smtp_out_server';
   EXCEPTION
      WHEN NO_DATA_FOUND
      THEN
         smtp_server_name := 'EMPTY_NAME';
   END;

   SELECT VALUE
     INTO cluster_database
     FROM v$parameter
    WHERE name = 'cluster_database';

   IF    RTRIM (LTRIM (smtp_server_name)) IS NULL
      OR smtp_server_name = 'EMPTY_NAME'
   THEN
      IF cluster_database = 'TRUE'
      THEN
         smtp_server_name := 'CLUSTER_NAME';
      ELSE
         SELECT host_name INTO smtp_server_name FROM v$instance;
      END IF;
   END IF;

   n := ora_sql_txt (sql_text);

   FOR i IN 1 .. n
   LOOP
      stmt := stmt || sql_text (i);
   END LOOP;

   IF (UPPER (stmt) LIKE '%UTL_MAIL%')
   THEN
      num_grantees := ora_grantee (grantee_list);
      grantee := UPPER (grantee_list (1));

      SELECT COUNT (1)
        INTO created
        FROM sys.dba_network_acls
       WHERE acl LIKE '%mail_service.xml%';

      -- dbms_output.put_line(to_char(created));
      IF created = 0
      THEN
         -- dbms_output.put_line('Creating...');
         DBMS_NETWORK_ACL_ADMIN.create_acl (acl           => 'mail_service.xml',
                                            description   => 'UTL_MAIL',
                                            principal     => grantee,
                                            is_grant      => TRUE,
                                            privilege     => 'connect');

         -- dbms_output.put_line('Created..');

         IF smtp_server_name = 'CLUSTER_NAME'
         THEN
            FOR host_cursor IN (SELECT host_name FROM gv$instance)
            LOOP
               DBMS_NETWORK_ACL_ADMIN.assign_acl (
                  acl    => 'mail_service.xml',
                  HOST   => host_cursor.host_name);
            END LOOP;
         ELSE
            DBMS_NETWORK_ACL_ADMIN.assign_acl (acl    => 'mail_service.xml',
                                               HOST   => smtp_server_name);
         END IF;
      ELSE
         SELECT COUNT (1)
           INTO hosted
           FROM sys.dba_network_acls
          WHERE     acl LIKE '%mail_service.xml%'
                AND (   HOST = smtp_server_name
                     OR (    smtp_server_name = 'CLUSTER_NAME'
                         AND HOST IN (SELECT host_name FROM gv$instance))
                     OR HOST = '*');

         IF hosted = 0
         THEN
            IF smtp_server_name = 'CLUSTER_NAME'
            THEN
               FOR host_cursor IN (SELECT host_name FROM gv$instance)
               LOOP
                  DBMS_NETWORK_ACL_ADMIN.assign_acl (
                     acl    => 'mail_service.xml',
                     HOST   => host_cursor.host_name);
               END LOOP;
            ELSE
               DBMS_NETWORK_ACL_ADMIN.assign_acl (acl    => 'mail_service.xml',
                                                  HOST   => smtp_server_name);
            END IF;
         END IF;
      END IF;

      FOR i IN 1 .. num_grantees
      LOOP
         grantee := UPPER (grantee_list (i));

         SELECT COUNT (1)
           INTO added
           FROM dba_network_acl_privileges
          WHERE     principal = grantee
                AND UPPER (privilege) = 'RESOLVE'
                AND acl LIKE '%mail_service.xml%';

         IF added = 0
         THEN
            DBMS_NETWORK_ACL_ADMIN.add_privilege (acl         => 'mail_service.xml',
                                                  principal   => grantee,
                                                  is_grant    => TRUE,
                                                  PRIVILEGE   => 'resolve');
         END IF;

         SELECT COUNT (1)
           INTO added
           FROM dba_network_acl_privileges
          WHERE     principal = grantee
                AND UPPER (privilege) = 'CONNECT'
                AND acl LIKE '%mail_service.xml%';

         IF added = 0
         THEN
            DBMS_NETWORK_ACL_ADMIN.add_privilege (acl         => 'mail_service.xml',
                                                  principal   => grantee,
                                                  is_grant    => TRUE,
                                                  PRIVILEGE   => 'connect');
         END IF;
      END LOOP;
   END IF;
END;
/

 

Revoke Trigger:

CREATE OR REPLACE TRIGGER capture_revoke_mail
   AFTER REVOKE
   ON DATABASE
DECLARE
   smtp_server_name   VARCHAR2 (30) := 'EMPTY_NAME';
   empty_list         EXCEPTION;
   PRAGMA EXCEPTION_INIT (empty_list, -24246);
   sql_text           ora_name_list_t;
   hosted             NUMBER := 0;
   grantee_list       ora_name_list_t;
   grantee            VARCHAR2 (30);
   stmt               VARCHAR2 (2000);
   n                  NUMBER;
   num_grantees       BINARY_INTEGER;
   added              NUMBER := 0;
   created            NUMBER := 0;
   cluster_database   VARCHAR2 (5) := 'FALSE';
BEGIN
   -- dbms_output.enable(20000);
   n := ora_sql_txt (sql_text);

   FOR i IN 1 .. n
   LOOP
      stmt := stmt || sql_text (i);
   END LOOP;

   IF (UPPER (stmt) LIKE '%UTL_MAIL%')
   THEN
      DBMS_OUTPUT.put_line (TO_CHAR (n));
      num_grantees := ora_revokee (grantee_list);
      DBMS_OUTPUT.put_line (TO_CHAR (num_grantees));
      grantee := UPPER (grantee_list (1));

      FOR i IN 1 .. num_grantees
      LOOP
         BEGIN
            -- dbms_output.put_line('In loop:');
            grantee := UPPER (grantee_list (i));

            SELECT COUNT (1)
              INTO added
              FROM dba_network_acl_privileges
             WHERE     principal = grantee
                   AND UPPER (privilege) = 'RESOLVE'
                   AND acl LIKE '%mail_service.xml%';

            IF added = 0
            THEN
               DBMS_OUTPUT.put_line (
                     'User does not have resolve privilege, not revoked='
                  || grantee);
            ELSE
               DBMS_NETWORK_ACL_ADMIN.delete_privilege (
                  acl         => 'mail_service.xml',
                  principal   => grantee,
                  is_grant    => TRUE,
                  privilege   => 'resolve');
            END IF;

            SELECT COUNT (1)
              INTO added
              FROM dba_network_acl_privileges
             WHERE     principal = grantee
                   AND UPPER (privilege) = 'CONNECT'
                   AND acl LIKE '%mail_service.xml%';

            IF added = 0
            THEN
               DBMS_OUTPUT.put_line (
                     'User does not have connect privilege, not revoked='
                  || grantee);
            ELSE
               DBMS_NETWORK_ACL_ADMIN.delete_privilege (
                  acl         => 'mail_service.xml',
                  principal   => grantee,
                  is_grant    => TRUE,
                  privilege   => 'connect');
            END IF;
         EXCEPTION
            WHEN empty_list
            THEN
               DBMS_NETWORK_ACL_ADMIN.drop_acl (acl => 'mail_service.xml');
               CONTINUE;
            WHEN OTHERS
            THEN
               DBMS_OUTPUT.put_line (SQLERRM);
               RAISE;
         END;
      END LOOP;
   END IF;
END;
/


Follow

Get every new post delivered to your Inbox.