VAAI - VMware vStorage APIs for Array Integration

What is VAAI ? An Introduction

VMware vStorage APIs for Array Integration / VAAI is a set of storage array operations aimed at reducing host CPU load and storage network bandwidth. The VAAI primitives allow for offloading certain storage operations to the array itself. Additionally with hardware-assisted locking, datastore contention can be significantly decreased resuting in a significant improvement in performance. Since early 2011 QUADStor has supported vStorage APIs for Array Integration (VAAI) in its storage virtualization software for Linux and FreeBSD.

The main operations in VAAI from a storage array point of view are:

Atomic Test and Set (ATS)

Prior to Atomic Test and Set, SCSI-2 reserve and release commands (a.k.a SCSI-2 reservations) were used for reserving and releasing volumes. The problem with SCSI-2 reservations is that a reserve would lock the entire volume, therefore other accesses read or write for the volume will have to wait till the volume is released. SCSI-3 introduced persistent reservations which allow fine grained locking such as shared write access etc. ESXi does not use SCSI-3 persistent reservations but uses another SCSI command call COMPARE AND WRITE. A COMPARE AND WRITE command essentially consists of two parts of user data sent to the device. The block(s) specified in the COMPARE AND WRITE command are read from disk and compared against the first part of the user data.

  • If the comparsion succeeds, then the user data is in the second part is written to the blocks specified by the command. This successful write is used by the host as successful lock.
  • If however the comparision fails, then no user data is written and the host would treat it a lock acquisition failure.

Now, rather than lock the entire volumes, the ATS commands can be used for locking portions of a volume and in the case of shared access to a volume, the performance improvements is significant

Full Copy / XCOPY / Clone Blocks

Quite frequently copy operations such a virtual machine cloning operation or a storage vmotion operation involve the same array as the source of the copy operation and the destination for the copy operation. Prior to VAAI the ESX/ESXi Server had to do more work for operations such as cloning, consuming network and server resources in the process. For example a cloning of a VM would require the server to read the data from the LUN corresponding to the VM being cloned and write the data to the LUN corresponding to the new VM. While this cannot be avoided if the two LUNs exist on different arrays, if the two LUNs reside on the same array it makes sense to instruct the array itself to copy the blocks with the ESX server in charge of the copy operation.

Without support for XCOPY from the array a copy operation can be described as

  • The host sends a read request to the array
  • The array sends the read response to the host with the read data traversing the storage fabric
  • The host sends a write request to the array with the write data traversing the storage fabric again

It can be easily seen that if the array is the same for both the read and write operations, all the host had to do is to instruct the array to read data blocks from a certain location and then write those blocks to another location. The XCOPY operation is similar to Offloaded Data Transfers (ODX) in Windows Server 2012 and Windows 8. See also Understanding Offloaded Data Transfers (ODX) Technology

Data Deduplication and Full Copy

The biggest impact of deduplication is on the full copy operation. An array which has no inbuilt deduplication support or only supports post deduplication can reduce the utilization of network resources. However within the array data still needs to be read from the source LUN and written back to the destination LUN. With inline deduplication an array can avoid writing back the data to the destination LUN. The data read from the source LUN will any case be deduplicated.

Since QUADStor's implementation is aware of the location of the source and destination data and since post write both source and destination data will any case be identical, with a few tweaks prior to the copy operation, data need not be read nor written! An additional benefit is that byte per byte verification (if enabled) can be totally eliminated.

The below table summarizes the impact of VAAI and deduplication on a clone operation for a VM of size 10GB

Source LUN Destination LUN Remarks
Without VAAI 10 GB read 10 GB written Utilizes both network and server resources
With VAAI (no dedupe/post-process dedupe) 10 GB read 10 GB written Network and server resources minimal.
Array resources used for read and write operations
With VAAI (Inline dedupe) 10 GB read 0 GB written Network and server resources minimal.
Array resources spent in reads and deduplication
With VAAI (QUADStor) 0 GB read 0 GB written Network and server resources minimal.
Array resources spent only in deduplication

Block Zeroing / Write Same / Zero Blocks

To fill with disk regions with zero data, prior to VAAI VMFS/ESXi would have to send zero data for the entire range that needs to be zero filled. Thus, to zero fill a region of 1GB on disk, 1GB of zero data will have to sent from the host to array, with 1GB of zero data traversing the storage fabric. With VAAI, a single or depending on the array configuration, a few WRITE SAME commands need to be sent from the host to the array. The purpose of the WRITE SAME command is to fill a range of blocks on disk with a block of data specified by the command. With VAAI, in order to zero fill the same region of 1GB on disk, only a few kilobytes or a few megabytes needs to traverse the storage fabric.

Block Delete / Unmap

Thin provisioned disks depend on the host to inform them if a block or range of blocks is no longer used. When a thin provisioned disk receives information that a block or range of blocks are no longer used, then it may release the physical disk blocks associated with those blocks so that those physical blocks may again be allocated to another thin provisioned disk. A host uses a SCSI UNMAP command (or even a WRITE SAME) command to for this purpose.

Thin Provisioning Stun / TP Stun

In the case of thin provisioned disk, if the available physical space is no longer sufficient to satisfy a write request it is considered as an out-of-space condition. That is, there are no longer physical disk blocks that can be mapped to the logical blocks of the thin provisioned disk. In such a case, the array may fail the write and inform the host regarding the out-of-space condition. An ESXi host will then pause the VM or VMs for which a write operation resulted in the out-of-space condition.