OCM 11g Preparation - Implement ASM failure groups

Failure groups is a way to implement some kind of RAID protection in your diskgroups. Its very useful when you want to guarantee an extra protection and turn on hot swap of your disks without an extra cost of an storage system.

Honestly, every company that I've worked avoided this protection as we had this implemented at the storage layer.

Practice here:

  • Create a normal/high diskgroup.
  • Clean one disk of your diskgroup with "dd" and try to recover it.
  • Drop/Add disks and play with rebalance power.

Path to Documentation:

Automatic Storage Management Administrator's Guide -> 4 Administering Oracle ASM Disk Groups


Click here to go back to the Main OCM 11g Preparation page.

Have you enjoyed? Please leave a comment or give a 👍!

2 comments

    • Radek on October 25, 2016 at 18:56
    • Reply

    Hi

    Interesting case that shows you that disk even if dropped it will still 'be there' even if it is useless. This is to satisfy NORMAL redundancy requirement (2-way mirroring)

    ----- Create diskgroup

    CREATE DISKGROUP FRA NORMAL REDUNDANCY
    FAILGROUP FRA DISK
    '/dev/oracleasm/disks/FRA'
    FAILGROUP FRA2 DISK
    '/dev/oracleasm/disks/FRA2'
    ATTRIBUTE 'au_size'='4M',
    'compatible.asm' = '11.2',
    'compatible.rdbms' = '11.2',
    'compatible.advm' = '11.2';

    Diskgroup created.

    ----- Corrupt one of the disks header. Disk will become candidate for addition

    [root@kamil ~]# dd if=/dev/zero of=/dev/sdd1 count=2
    2+0 records in
    2+0 records out
    1024 bytes (1.0 kB) copied, 5.0001e-05 seconds, 20.5 MB/s

    -- Check disks status. Note that one of the disks(FRA2) is ready to be added to ASM

    SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,STATE,NAME,FAILGROUP,PATH,REPAIR_TIMER from v$asm_disk;

    GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU STATE NAME FAILGROUP PATH REPAIR_TIMER
    ------------ ----------- ------- ------------ -------- ------------------------------ ------------------------------ -------------------------------------------------- ------------
    2 2 CACHED CANDIDATE NORMAL FRA_0002 FRA2 /dev/oracleasm/disks/FRA2 0
    2 0 CACHED MEMBER NORMAL FRA_0000 FRA /dev/oracleasm/disks/FRA 0
    1 0 CACHED MEMBER NORMAL DATA_0000 DATA_0000 /dev/oracleasm/disks/DATA 0

    ---- Now I want to pretend I am fixing the disk. If I dont get succeeded I want the disk to be dropped after 1 minute.

    SQL> ALTER DISKGROUP FRA OFFLINE DISK FRA_0002 DROP AFTER 1m;

    Diskgroup altered.

    --- Check status again. New row get added. Complaining that my disk is missing. After 1 minute it is still there.

    SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,STATE,NAME,FAILGROUP,PATH,REPAIR_TIMER from v$asm_disk;

    GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU STATE NAME FAILGROUP PATH REPAIR_TIMER
    ------------ ----------- ------- ------------ -------- ------------------------------ ------------------------------ -------------------------------------------------- ------------
    0 0 CLOSED CANDIDATE NORMAL /dev/oracleasm/disks/FRA2 0
    2 2 MISSING UNKNOWN NORMAL FRA_0002 FRA2 60
    2 0 CACHED MEMBER NORMAL FRA_0000 FRA /dev/oracleasm/disks/FRA 0
    1 0 CACHED MEMBER NORMAL DATA_0000 DATA_0000 /dev/oracleasm/disks/DATA 0

    ----- Now I am trying to create new disk of the ones that header got corrupted

    [root@kamil ~]# oracleasm querydisk -v -d /dev/oracleasm/disks/*
    Device "/dev/oracleasm/disks/DATA" is marked an ASM disk with the label "DATA"
    Device "/dev/oracleasm/disks/FRA" is marked an ASM disk with the label "FRA"
    Device "/dev/oracleasm/disks/FRA2" is not marked as an ASM disk !!!!!!!!!!!!!!!!!!!!

    [root@kamil ~]# oracleasm scandisks
    Reloading disk partitions: done
    Cleaning any stale ASM disks...
    Cleaning disk "FRA2"
    Scanning system for ASM disks...
    [root@kamil ~]# oracleasm createdisk FRA2 /dev/sdd1
    Writing disk header: done
    Instantiating disk: done
    [root@kamil ~]# oracleasm scandisks
    Reloading disk partitions: done
    Cleaning any stale ASM disks...
    Scanning system for ASM disks...
    [root@kamil ~]# oracleasm listdisks
    DATA
    FRA
    FRA2

    ----- Check status. Now its header changed to PROVISIONED

    GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU STATE NAME FAILGROUP PATH REPAIR_TIMER
    ------------ ----------- ------- ------------ -------- ------------------------------ ------------------------------ -------------------------------------------------- ------------
    0 0 CLOSED PROVISIONED NORMAL /dev/oracleasm/disks/FRA2 0
    2 2 MISSING UNKNOWN FORCING _DROPPED_0002_FRA FRA2 0
    2 0 CACHED MEMBER NORMAL FRA_0000 FRA /dev/oracleasm/disks/FRA 0
    1 0 CACHED MEMBER NORMAL DATA_0000 DATA_0000 /dev/oracleasm/disks/DATA 0

    ----- Disk is ready to be readded to failgroup FRA2. Hold on balancing. We will observe its behaviour soon

    SQL> ALTER DISKGROUP FRA ADD FAILGROUP FRA2 DISK '/dev/oracleasm/disks/FRA2' REBALANCE POWER 0;

    Diskgroup altered.

    ----- Even though disk got added , the old entry is still there. To recap: disk got dropped, new disk got added, but old entry still remained.

    SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,STATE,NAME,FAILGROUP,PATH,REPAIR_TIMER from v$asm_disk;

    GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU STATE NAME FAILGROUP PATH REPAIR_TIMER
    ------------ ----------- ------- ------------ -------- ------------------------------ ------------------------------ -------------------------------------------------- ------------
    2 2 MISSING UNKNOWN FORCING _DROPPED_0002_FRA FRA2 0
    2 1 CACHED MEMBER NORMAL FRA_0001 FRA2 /dev/oracleasm/disks/FRA2 0
    2 0 CACHED MEMBER NORMAL FRA_0000 FRA /dev/oracleasm/disks/FRA 0
    1 0 CACHED MEMBER NORMAL DATA_0000 DATA_0000 /dev/oracleasm/disks/DATA 0

    ----- Now we start the rebalancing operation

    SQL> SELECT * FROM V$ASM_OPERATION;

    GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
    ------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
    2 REBAL WAIT 0

    SQL> ALTER DISKGROUP FRA REBALANCE POWER 1;

    Diskgroup altered.

    SQL> SELECT * FROM V$ASM_OPERATION;

    GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
    ------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
    2 REBAL REAP 1 1 27 27 820 0

    SQL> /

    GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE
    ------------ ----- ---- ---------- ---------- ---------- ---------- ---------- ----------- --------------------------------------------
    2 REBAL WAIT 1

    SQL> /

    no rows selected

    ----- Look !!! Old , missing disk entry is gone. So we get back to sitaution before corruption where new disk got added (repaired) and rebalancing completed. ASM is happy now!

    SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,STATE,NAME,FAILGROUP,PATH,REPAIR_TIMER from v$asm_disk;

    GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU STATE NAME FAILGROUP PATH REPAIR_TIMER
    ------------ ----------- ------- ------------ -------- ------------------------------ ------------------------------ -------------------------------------------------- ------------
    2 1 CACHED MEMBER NORMAL FRA_0001 FRA2 /dev/oracleasm/disks/FRA2 0
    2 0 CACHED MEMBER NORMAL FRA_0000 FRA /dev/oracleasm/disks/FRA 0
    1 0 CACHED MEMBER NORMAL DATA_0000 DATA_0000 /dev/oracleasm/disks/DATA 0

    1. Hi Radek,
      Thanks always for your posts contributions and helping other OCMers.
      Interesting case, your v$asm_disk should have shown the disk as missing since your first query after dd. The problem is that you corrupted the header (where all the metadata about disks and DGs are stored), so you ASM got confused.

      Usually in cases where the header is corrupted (so you messed the metadata, not the data itself), best approach to repair is using "kfed repair /dev/oracleasm/disks/FRA2". Try that and give me a feedback.

      Thanks,
      RJ

Leave a Reply

Your email address will not be published.