Oracle RAC in VMWare hangs with ORA-27072: File I/O error

This post is also available in: Português

During three months, I have had constant problems of an Oracle RAC installation suddenly hanging. The system was as follows:

O.S: Red Hat Enterprise Linux Server release 5.8 (Tikanga) - x86_64
Kernel: 2.6.18-308.16.1.el5
VMWare: VMware ESXi 5.0
Oracle 11.2.0.3 PSU 6
Oracle homes in different drives and shared disks for RAC via VMFS

It was random, the node dropped and came back online. Sometimes the machine restarted by Oracle processes.
Even after re-installing, upgrading Oracle to the latest version and applying the latest PSU, the problem continued.

The oracle alert and trace files showed the following error:

ORA-27072: File I/O error
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 657408
Additional information: -1
WARNING: Read Failed. group:1 disk:0 AU:321 offset:0 size:4096
path:/dev/oracleasm/disks/DATA01
incarnation:0xe968aff8 synchronous result:'I/O error'
subsys:System iop:0x7fb195be9000 bufp:0x7fb196117000 osderr:0x0 osderr1:0x0
ERROR: failed to read ACD block gn=1 blk=10752
ORA-15080: synchronous I/O operation to a disk failed
WARNING: LGWR failed to read ACDC for diskgroup 1 thread 2
WARNING: disk offlining resulting in I/O error
WARNING: Write Failed. group:1 disk:0 AU:321 offset:0 size:4096
path:/dev/oracleasm/disks/DATA01
incarnation:0xe968aff8 asynchronous result:'I/O error'
subsys:System iop:0x7fb195be9000 bufp:0x647fd000 osderr:0x534b4950 osderr1:0x0

Searching further, I found the following error in the log in Redhat /var/log/messages:

Jul 6 19:45:58 oraclesrv01 kernel: sd 1:0:0:0: reservation conflict
Jul 6 19:45:58 oraclesrv01 kernel: sd 1:0:0:0: Unhandled error code
Jul 6 19:45:58 oraclesrv01 kernel: sd 1:0:0:0: SCSI error: return code = 0x00110018
Jul 6 19:45:58 oraclesrv01 kernel: Result: hostbyte=invalid driverbyte=DRIVER_OK,SUGGEST_OK

As a workaround, I tried:

Mount brand new discs, but the problem remains.
Separate the data in one disk and the cluster configuration and votefile in another 3 disks.

Nothing solved. After a lot of research, I finally discovered the cause of the error.

VMFS is a clustered file system that disables (by default) multiple virtual machines from opening and writing to the same virtual disk (.vmdk file). This prevents more than one virtual machine from inadvertently accessing the same .vmdk file.

The multi-writer option allows VMFS-backed disks to be shared by multiple virtual machines. This option is used to support VMware fault tolerance, which allows a primary virtual machine and a standby virtual machine to simultaneously access a .vmdk file.

The article link below teaches you step by step to activate this option:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1034165
Disabling simultaneous write protection provided by VMFS using the multi-writer flag (1034165)

Briefly, what needs to be done is to edit the file ".vmx" that defines the settings of the virtual machine and add for each shared disk the following entry:

scsiX:Y.sharing = "multi-writer"

If four disks are being shared, the entries would be as follows:

scsi1:0.sharing = "multi-writer"
scsi1:1.sharing = "multi-writer"
scsi1:2.sharing = "multi-writer"
scsi1:3.sharing = "multi-writer"

That's it! Now your oracle will stop giving this inexplicable error.

Have you enjoyed? Please leave a comment or give a 👍!

+10