What is VAAI ? An Introduction
VMware vStorage APIs for Array Integration / VAAI is a set of storage array operations aimed at reducing host CPU load and storage network bandwidth. The VAAI primitives allow for offloading certain storage operations to the array itself. Additionally with hardware-assisted locking, datastore contention can be significantly decreased resuting in a significant improvement in performance. Since early 2011 QUADStor has supported vStorage APIs for Array Integration (VAAI) in its storage virtualization software for Linux and FreeBSD. The main operations in VAAI from a storage array point of view are:
Atomic Test and Set (ATS)
Prior to Atomic Test and Set, SCSI-2 reserve and release commands (a.k.a SCSI-2 reservations) were used for reserving and releasing volumes. The problem with SCSI-2 reservations is that a reserve would lock the entire volume, therefore other accesses read or write for the volume will have to wait till the volume is released. SCSI-3 introduced persistent reservations which allow fine grained locking such as shared write access etc. ESXi does not use SCSI-3 persistent reservations but uses another SCSI command call COMPARE AND WRITE. A COMPARE AND WRITE command essentially consists of two parts of user data sent to the device. The block(s) specified in the COMPARE AND WRITE command are read from disk and compared against the first part of the user data.- If the comparsion succeeds, then the user data is in the second part is written to the blocks specified by the command. This successful write is used by the host as successful lock.
- If however the comparision fails, then no user data is written and the host would treat it a lock acquisition failure.
Full Copy / XCOPY / Clone Blocks
Quite frequently copy operations such a virtual machine cloning operation or a storage vmotion operation involve the same array as the source of the copy operation and the destination for the copy operation. Prior to VAAI the ESX/ESXi Server had to do more work for operations such as cloning, consuming network and server resources in the process. For example a cloning of a VM would require the server to read the data from the LUN corresponding to the VM being cloned and write the data to the LUN corresponding to the new VM. While this cannot be avoided if the two LUNs exist on different arrays, if the two LUNs reside on the same array it makes sense to instruct the array itself to copy the blocks with the ESX server in charge of the copy operation. Without support for XCOPY from the array a copy operation can be described as- The host sends a read request to the array
- The array sends the read response to the host with the read data traversing the storage fabric
- The host sends a write request to the array with the write data traversing the storage fabric again
Data Deduplication and Full Copy
The biggest impact of deduplication is on the full copy operation. An array which has no inbuilt deduplication support or only supports post deduplication can reduce the utilization of network resources. However within the array data still needs to be read from the source LUN and written back to the destination LUN. With inline deduplication an array can avoid writing back the data to the destination LUN. The data read from the source LUN will any case be deduplicated.
Since QUADStor's implementation is aware of the location of the source and destination data and since post write both source and destination data will any case be identical, with a few tweaks prior to the copy operation, data need not be read nor written! An additional benefit is that byte per byte verification (if enabled) can be totally eliminated.
The below table summarizes the impact of VAAI and deduplication on a clone operation for a VM of size 10GB
Source LUN | Destination LUN | Remarks | |
---|---|---|---|
Without VAAI | 10 GB read | 10 GB written | Utilizes both network and server resources |
With VAAI (no dedupe/post-process dedupe) | 10 GB read | 10 GB written | Network and server resources minimal. Array resources used for read and write operations |
With VAAI (Inline dedupe) | 10 GB read | 0 GB written | Network and server resources minimal. Array resources spent in reads and deduplication |
With VAAI (QUADStor) | 0 GB read | 0 GB written | Network and server resources minimal. Array resources spent only in deduplication |