Oracle Cluster with DRBD, Pacemaker, and Corosync

In this post, we are going to build an Oracle active-passive cluster using pacemaker, corosync, and DRBD.  Please note that any Oracle licensing comments made in this post are purely my personal opinion, they are not binding on my employer, or Oracle, nor do they have any legal standing.

This was originally intended simply as a thought exercise, could I put together a fairly resilient, fairly highly available, oracle configuration over shared storage without using Oracle RAC?  The answer to that question is that the shared storage piece is impossible without Oracle RAC, however, using DRBD, we can get something close.

DRBD stands for Distributed Replicated Block Device.  It works, as the name implies, by replicating blocks.  One DRBD device is designated as the primary device, additional devices are designated as secondary devices, and blocks are replicated from the primary device to the secondary device.

Pacemaker and Corosync are linux clustering software that allow for communication between the cluster nodes, maintain synchronization for cluster resources, and monitor the resources for availability.  When a resource becomes unavailable, they also manage the failover.

So, lets begin. 

The servers:  VMware workstations running OEL 7.6, 8G RAM.  In addition to a 20G root device, each server has a 20G vmdk for the drbd device for the Oracle database binaries, and another 20G vmdk device for the Oracle database data files. 

The DRBD devices are configured as logical volumes (LVM) in order to make adding space easier. 

The server names are linclust1 and linclust2.  Below are the hosts file, note that the NICs for cluster management are named linclust1-hb and linclust2-hb, and the storage management nics are named lincust1-priv and linclust2-priv.  It is definitely recommended that different NICS be used for the storage and internodal communications.

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

10.12.1.58 linclust1 linclust1.localdomain
10.12.1.59 linclust2 linclust2.localdomain

192.168.31.101 linclust1-priv linclust1-priv.localdomain
192.168.31.102 linclust2-priv linclust2-priv.localdomain

192.168.57.101 linclust1-hb linclust1-hb.localdomain
192.168.57.102 linclust2-hb linclust2-hb.localdomain

10.12.1.61 linclust-vip  linclust-vip.localdomain

Before doing any other cluster configuration, we have to configure LVM and DRBD.  The drbd devices show up on each node as /dev/sdb and /dev/sdc.

Create partitions using fdisk.

  • Verify the partitions available on the server: fdisk -l
  • Run fdisk /dev/sdb
  • Type ‘n’ to create a new partition.
  • Specify where you would like the partition to end and start.  You can set the number of MB of the partition instead of the end cylinder.  For example:  +1000M
  • Type ‘p’ to view the partition, and type ‘w’ to save the partition
  • Repeat for /dev/sdc

    Since this RHEL/OEL7, the partition will automatically be created aligned on a sector boundary when we run fdisk. In early versions, we would need to align it manually.

    Next, we create the logical volumes for each device. 

    pvcreate /dev/sdb1
    vgcreate shared1 /dev/sdb1
    lvcreate –name shared1 -l 100%FREE shared1

    pvcreate /dev/sdc1
    vgcreate shared2 /dev/sdc1
    lvcreate –name shared2 -l 100%FREE shared2

    Next we install the required software on both cluster servers:

    yum –y install drbd pacemaker corosync pcs pcsd

    The above command will install all the required clustering software.

    Install the required oracle packages,

    yum –y install oracle-database-server-12cR2-preinstall

    Now we are ready to configure DRBD. DRBD continuously replicates data from the primary to the secondary device.

    1. Edit global_common.conf, it should read as follows:

    global {
    usage-count no;
    }
    common {
    net {
      protocol C;
    }
    }

    2. Create the actual definition file for our configuration.  In our case, the name of the file is drbd00.res, and has the following lines:

    resource drbd00 {
            device /dev/drbd0;
            disk /dev/shared1/shared1;
            meta-disk internal;
            net {
                      allow-two-primaries;
            }
            syncer {
               verify-alg sha1;
            }
            on linclust1.localdomain {
                    address 192.168.31.101:7789;
            }
            on linclust2.localdomain {
                    address 192.168.31.102:7789;
            }
          }
    resource drbd01 {
            device /dev/drbd1;
            disk /dev/shared2/shared2;
            meta-disk internal;
            net {
                      allow-two-primaries;
            }
            syncer {
               verify-alg sha1;
            }
            on linclust1.localdomain {
                    address 192.168.31.101:7790;
            }
            on linclust2.localdomain {
                    address 192.168.31.102:7790;
            }
          }

    3. Copy drbd00.res and global_common.conf to /etc/drbd.d on the second cluster node. 

    4. At this point we are ready to start drbd.  Note that we are using the option ‘allow two primaries’.  This is because PCS will manage the mounting of the file system for the software and data.

    Run these commands to initialize drbd:

    drbdadm create-md drbd00

    drbdadm create-md drbd01

    The above commands initialize the drbd data.

    5. Start drbd

    systemctl start drbd.service

    systmctl enable drbd.service

    6. Run the commands below to assign the primary nodes, Run these only on node 1:

    drbdadm primary drbd00 –force

    drbdadm primary drbd01 –force

    7. The primary command designates the current node as the primary node so that we can make changes to the drbd device attached to this node.  At this point DRBD is running.  We can see the DRBD devices at /dev/drbd0 and /dev/drbd1.

    8. Next, we need to create a file system on the drbd devices.  Since we are going to use oracle and active-passive, we will create an xfs file system on each device.  The commands below are run on the primary node:

    mkfs –t xfs /dev/drbd0

    mkfs –t xfs /dev/drbd1

    The mkfs command will not work where the disks are secondary.

    9. At this point the replication should be working.  Run the command below to check:

    [root@linclust1 ~]# cat /proc/drbd
    version: 8.4.5 (api:1/proto:86-101)
    srcversion: 1AEFF755B8BD61B81A0AF27
    0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r—–
        ns:406265 nr:383676 dw:790464 dr:488877 al:35 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
    1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r—–
        ns:4318663 nr:5086791 dw:9409141 dr:470229 al:211 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

    Note that if the status does not show UpToDate, wait for a bit, check again, and verify that everything is up to date.

    10. DRBD is now fully configured and working.  As part of the cluster configuration we are going to mount these as file systems, so the next step is to create the mount points. I am going to use /u01 and /u02.

    mkdir –p /u01

    mkdir –p /u02

    chown oracle:oinstall /u01

    chown oracle:oinstall /u02

    Next, we configure the cluster.  The cluster software we installed earlier includes pcs, pacemaker, and corosync.

    When we installed pcs, the user hacluster was created.  Modify the /etc/passwd file for the hacluster to allow user login, as shown below:

    hacluster:x:189:189:cluster user:/home/hacluster:/bin/bash

    Then create the directory for hacluster, and set hacluster:hacluster as the owner:

    mkdir –p /home/hacluster

    chown hacluster:hacluster /home/hacluster

    Next, set the password for the hacluster user using the passwd command:

    passwd hacluster

    Now, start the cluster services:

    systemctl start pcsd.service

    systemctl enable pcsd.service

    In order for the cluster to manage itself, we have to authorize the access for the cluster manager on each node.  The ‘pcs cluster auth’ command does that:

    pcs cluster auth lincust1 linclust2

    Next we create the cluster. Note that we are using the interface –hb for cluster management.  This should be an internal only network:

    pcs cluster setup –name DRBD_CLUSTER linclust1-hb linclust2-hb

    Disable stonith.  We do not want to fence a node that is not working, drdb and pcs should be able to handle it properly:

    pcs property set stonith-enabled=FALSE

    pcs cluster start –all

    At this point, the basic cluster services are configured and the cluster is running. Now its time to configure the required services.

    First create the virtual IP address (VIP) that users will use to connect to the database.  The users don’t want to try and figure out which IP is correct, so a simple virtual IP is created that will always run where Oracle is running is created.

    pcs resource create ClusterVIP ocf:heartbeat:IPaddr2 ip=10.12.1.61 cidr_netmask=32 op monitor interval=30s

    Next, the resources to manage drbd.  Since we have two drbd devices, we will need 4 resource.  The ‘raw’ resource (note, name chosen by me) simply tells PCS to keep track of the DRBD device.  The ‘master’ service tells PCS that it has to manage (master) the drbd service it created:

    pcs resource create binraw ocf:linbit:drbd drbd_resource="drbd00" op monitor interval=10s
    pcs resource master binmaster binraw master-max=1 master-node-max=1 clone-max=2 clone-node-max=2 notify=true
    pcs resource create dataraw ocf:linbit:drbd drbd_resource="drbd01" op monitor interval=10s
    pcs resource master datamaster dataraw master-max=1 master-node-max=1 clone-max=2 clone-node-max=2 notify=true

    Note this entry: ocf:linbit:drbd.  That is the type of service for PCS to monitor.  DRBD comes with a large collection of services already defined for it to monitor.  The user can also create her own. For a complete list of available services, run the command ‘pcs resource list’.

    Next, we mount the oracle binaries and file system and tell PCS to manage it:

    pcs resource create BINARIES filesystem device="/dev/drbd0" directory="/u01" fstype="xfs"
    pcs resource create DATAFILES filesystem device="/dev/drbd1" directory="/u02" fstype="xfs"

    Next, we configure some colocation and startup rules:

    pcs constraint colocation add binmaster with ClusterVIP INFINITY
    pcs constraint colocation add datamaster with ClusterVIP INFINITY
    pcs constraint order ClusterVIP then binmaster then datamaster

    pcs constraint colocation add BINARIES with binmaster INFINITY;
    pcs constraint colocation add DATAFILES with datamaster INFINITY;
    pcs constraint order promote binmaster then start BINARIES
    pcs constraint order promote datamaster then start DATAFILES
    pcs resource defaults migration-threshold=1
    pcs resource group add oracle ClusterVIP BINARIES  DATAFILES

    Now, its time to install the Oracle software and create the database.  Just use the regular oracle installer, and dbca to create the database. Remember that the binaries go on /u01 and the data files go on /u02.  You install on only one node. 

    Once the oracle installation is complete, and the database is up and running, copy the files coraenv, dbhome, and oraenv from /usr/local/bin on node1 to the same directory on node 2.  Make sure the privileges are copied correctly. 

    Copy the /etc/oratab and /etc/oraInst.loc files to node 2.  These five files are the only files that oracle requires that are not in /u01 or /u02.

    Edit the file $ORACLE_HOME/network/admin/listener.ora.  Change the IP address of the listener to listen on the VIP for the cluster, in this case 10.12.1.61.  This will allow for the access to the database to fail over.

    Create an oracle user to monitor the state of the database, default name is ocfmon:

    SQLPLUS> create user ocfmon identified by mypassword;

    SQLPLUS>grant create session to OCFMON;

    Create the resources to monitor oracle:

    pcs resource create oracleDB ocf:heartbeat:oracle sid="drbddb" –group=oracle

    pcs resource update oracleDB monuser="ocfmon" monpassword="ocfmon" monprofile="default"

    pcs resource create listenerdrbddb ocf:heartbeat:oralsnr sid="drbddb" listener="listener" –group=oracle

    At this point, the cluster is created and services are running.

    As I understand Oracle licensing, this configuration would require just one node to be licensed for Oracle.  The passive node cannot physically run oracle unless Oracle is shut down on the active node. And the Oracle software is not even viewable on the passive node unless the passive node has become the active node. In effect, Oracle is only installed on the active node, but when failover occurs the active node changes, oracle is dismounted from the passive node and mounted on the new active node.

    This is what you should see when you check the status:

    [root@linclust1 ~]# pcs status
    Cluster name: DRBD_CLUSTER
    Stack: corosync
    Current DC: linclust1-hb (version 1.1.19-8.el7_6.4-c3c624ea3d) – partition with quorum
    Last updated: Thu Feb 28 13:56:36 2019
    Last change: Wed Feb 20 17:18:09 2019 by hacluster via crmd on linclust1-hb

    2 nodes configured
    9 resources configured

    Online: [ linclust1-hb linclust2-hb ]

    Full list of resources:

    Master/Slave Set: binmaster [binraw]
         Masters: [ linclust1-hb ]
         Stopped: [ linclust2-hb ]
    Master/Slave Set: datamaster [dataraw]
         Masters: [ linclust1-hb ]
         Stopped: [ linclust2-hb ]
    Resource Group: oracle
         ClusterVIP (ocf::heartbeat:IPaddr2):       Started linclust1-hb
         BINARIES   (ocf::heartbeat:Filesystem):    Started linclust1-hb
         DATAFILES  (ocf::heartbeat:Filesystem):    Started linclust1-hb
         oracleDB   (ocf::heartbeat:oracle):        Started linclust1-hb
         listenerdrbddb     (ocf::heartbeat:oralsnr):       Started linclust1-hb

    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled

    Note that this is not true HA.  It takes about 10 minutes for failover to occur and for Oracle to start on the second node.  If you need true HA, you will need to spend the money required for a commercial solution.

    Whenever you failover, be sure and check the status of both drbd (cat /proc/drbd) and PCS (pcs status).  Because of slowness I saw in starting drbd, I added the following script to run after bootup to make sure drbd was up and running:

    [root@linclust2 startup]# cat postboot.sh
    #!/bin/bash
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin
    export PATH
    sleep 3m
    drbdadm up drbd00
    drbdadm up drbd01
    cat /proc/drbd > /tmp/drbd.out

    To create the script above and ensure it runs each time the system starts, I created it as a service.  To do this, I created a file called postboot.service in /etc/systemd/system/. The file has the following contents:

    [root@linclust2 system]# cat postboot.service

    [Unit]
    Description=Script to run things after everything else starts
    After=network.target

    [Service]
    Type=simple
    ExecStart=/root/startup/postboot.sh
    TimeoutStartSec=0

    [Install]
    WantedBy=default.target

    Note the file name after ExecStart=.  That is the file that gets executed.  To enable this, run this command:

    systemctl enable postboot.service

    systemctl start postboot.service

    You can also modify the script at this point to include any other commands you need.

    In my next blog, I will discuss how to add space to DRBD in lvm in this cluster configuration.

  • Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out /  Change )

    Google photo

    You are commenting using your Google account. Log Out /  Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out /  Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out /  Change )

    Connecting to %s


    %d bloggers like this: