Testing the ASM Filter Driver (AFD)

Testing the Oracle ASM Filter Driver (AFD)
In my previous post I converted from standard Oracle ASM to the ASM Filter Driver (AFD). According to Oracle, the ASM Filter Driver will prevent non-oracle software from updating the AFD devices. Let’s test that out.
First we try this as the owner of the GI installation grid:
[grid@vmrh65node1 ~]$ dd if=/dev/zero of=/dev/sdf1
dd: opening `/dev/sdf1′: Permission denied

Ok, that’s pretty straight forward. Grid does not have permission to modify the device. Let’s see what happens when we connect as root and do it:

[grid@vmrh65node1 ~]$ asmcmd afd_lsdsk
——————————————————————————–
Label Filtering Path
================================================================================
OCR1 ENABLED /dev/sdc1
OCR2 ENABLED /dev/sdd1
OCR3 ENABLED /dev/sde1
DATA ENABLED /dev/sdf1
RECO ENABLED /dev/sdg1

[root@vmrh65node1 ~]# cp oracleasmlib-2.0.4-1.el6.x86_64.rpm /dev/oracleafd/disks/DATA
cp: overwrite `/dev/oracleafd/disks/DATA’? y

Hmm. That’s not so good. What does the OS say?

[root@vmrh65node1 ~]# ls -la /dev/oracleafd/disks
total 32
drwxrwx— 2 grid dba 140 Nov 24 20:41 .
drwxrwx— 3 grid dba 80 Nov 24 20:41 ..
-rw-r–r– 1 root root 13300 Nov 25 12:52 DATA
-rw-r–r– 1 root root 10 Nov 24 20:41 OCR1
-rw-r–r– 1 root root 10 Nov 24 20:41 OCR2
-rw-r–r– 1 root root 10 Nov 24 20:41 OCR3
-rw-r–r– 1 root root 10 Nov 24 20:41 RECO

That doesnt look good. What does Oracle say?

[grid@vmrh65node1 asm]$ asmcmd afd_lsdsk
——————————————————————————–
Label Filtering Path
================================================================================
OCR1 ENABLED /dev/sdc1
OCR2 ENABLED /dev/sdd1
OCR3 ENABLED /dev/sde1
DATA ENABLED
RECO ENABLED /dev/sdg1

Ok. That’s not promising. But maybe there is something going on that I don’t know about. Is the database still up?

[root@vmrh65node1 ~]# ps -ef | grep pmon
grid 3288 1 0 Nov24 ? 00:00:06 asm_pmon_+ASM1
oracle 3825 1 0 Nov24 ? 00:00:13 ora_pmon_o11rhawd1
root 8379 7214 0 12:52 pts/0 00:00:00 grep pmon

Well, that’s interesting. It does look like the database is still up. Lets try a log in.

[root@vmrh65node1 ~]# su – oracle
[oracle@vmrh65node1 ~]$ . oraenv
ORACLE_SID = [oracle] ? o11rhawd1
The Oracle base has been set to /u01/app/oracle
[oracle@vmrh65node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Nov 25 12:53:14 2014

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 4275781632 bytes
Fixed Size 2260088 bytes
Variable Size 939525000 bytes
Database Buffers 3321888768 bytes
Redo Buffers 12107776 bytes
Database mounted.
Database opened.
SQL> show parameter background

NAME TYPE VALUE
———————————— ———– ——————————
background_core_dump string partial
background_dump_dest string /u01/app/oracle/diag/rdbms/o11
rhawd/o11rhawd1/trace
SQL> exit

That looks very good. As we know, Oracle won’t start if it has problems with the system tablespace, and on this instance the system tablespace is in the DATA disk group.

Let’s reboot and see what Oracle says.

[root@vmrh65node1 ~]# reboot

Broadcast message from oracle@vmrh65node1.awddev.dstcorp.net
(/dev/pts/0) at 18:26 …

The system is going down for reboot NOW!
[root@vmrh65node1 ~]#
login as: oracle
oracle@10.193.204.10’s password:
Last login: Mon Nov 24 20:45:20 2014 from 10.201.232.32
[oracle@vmrh65node1 ~]$ ls
adrci_cleanup.sh backup dgdemo02pmmkvl_1_1.bak grid initdgdemo.ora mapper_behind.sh mapper_checks.sh oradiag_oracle patch working1.sh
[oracle@vmrh65node1 ~]$ ps -ef | grep pmon
grid 2992 1 0 18:28 ? 00:00:00 asm_pmon_+ASM1
oracle 3453 3410 0 18:28 pts/0 00:00:00 grep pmon
[oracle@vmrh65node1 ~]$ ls
adrci_cleanup.sh backup dgdemo02pmmkvl_1_1.bak grid initdgdemo.ora mapper_behind.sh mapper_checks.sh oradiag_oracle patch working1.sh
[oracle@vmrh65node1 ~]$ ps -ef | grep pmon
grid 2992 1 0 18:28 ? 00:00:00 asm_pmon_+ASM1
oracle 3484 3410 0 18:28 pts/0 00:00:00 grep pmon
[oracle@vmrh65node1 ~]$ ps -ef | grep pmon
grid 2992 1 0 18:28 ? 00:00:00 asm_pmon_+ASM1
oracle 3486 3410 0 18:28 pts/0 00:00:00 grep pmon
[oracle@vmrh65node1 ~]$ ps -ef | grep pmon
grid 2992 1 0 18:28 ? 00:00:00 asm_pmon_+ASM1
oracle 3488 3410 0 18:28 pts/0 00:00:00 grep pmon
[oracle@vmrh65node1 ~]$ ps -ef | grep pmon
grid 2992 1 0 18:28 ? 00:00:00 asm_pmon_+ASM1
oracle 3502 3410 0 18:28 pts/0 00:00:00 grep pmon
[oracle@vmrh65node1 diag]$ ps -ef | grep pmon
grid 2992 1 0 18:28 ? 00:00:00 asm_pmon_+ASM1
oracle 3504 1 0 18:29 ? 00:00:00 ora_pmon_o11rhawd1
oracle 3618 3410 0 18:29 pts/0 00:00:00 grep pmon

Ok, so the database came back up. What does the file system look like now?

[oracle@vmrh65node1 diag]$ ls -la /dev/oracleafd/disks
total 20
drwxrwx— 2 grid dba 140 Nov 25 18:27 .
drwxrwx— 3 grid dba 80 Nov 25 18:27 ..
-rw-r–r– 1 root root 10 Nov 25 18:27 DATA
-rw-r–r– 1 root root 10 Nov 25 18:27 OCR1
-rw-r–r– 1 root root 10 Nov 25 18:27 OCR2
-rw-r–r– 1 root root 10 Nov 25 18:27 OCR3
-rw-r–r– 1 root root 10 Nov 25 18:27 RECO

[oracle@vmrh65node1 diag]$ su – grid
Password:
[grid@vmrh65node1 ~]$ . oraenv
ORACLE_SID = [grid] ? +ASM1
The Oracle base has been set to /u01/app/oracle

[grid@vmrh65node1 ~]$ asmcmd afd_lsdsk
——————————————————————————–
Label Filtering Path
================================================================================
OCR1 ENABLED /dev/sdc1
OCR2 ENABLED /dev/sdd1
OCR3 ENABLED /dev/sde1
DATA ENABLED /dev/sdf1
RECO ENABLED /dev/sdg1

Ok, so now everything is back the way it was. AFD is living up to its claim of protecting the data files. This is actually very nice, I have definitely seen times when the ASM disks were accidently overwritten by inexperienced users with too much access.

UPDATE: I rebooted both cluster nodes again, and when I did the entire /dev/sdf1 partition was missing, which meant my DATA diskgroup was missing. While it is possible that this is merely coincidence, I consider it very unlikely. My working theory is that the data on the DATA diskgroup was somehow protected until the server was bounced, at which point the protection was lost and the write to the partition happened. This means that writes to the diskgroup are not protected from the super user by the ASM Filter Driver.

UPDATE 2:  I was able to get the data back in the data diskgroup just by partitioning the device again.  Perhaps ASMFD allowed the device header to get overwritten without overwriting the data?  Im not sure, but I know for a fact that without some protection I should not have been able to restore the data that easily.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: