Clustering was introduced to provide scalability and high availability. The clustering described in this article is shared storage clustering, where in nodes belonging to the cluster have access to a common storage.
Stand-alone disadvantages
CPU Utilization
Deduplication is a very CPU and memory intensive process. With many clients accessing the system, the performance of the system can be limited by high CPU usage, primarily due to hash (fingerprint) computation of data blocks and/or compression of data blocks
Single Point of Failure
Prior to the clustering feature a QUADStor system is a single point of failure. On a failure clients no longer have access to data in the VDisk(s) till the QUADStor system is back online.
QUADStor Clustering Architecture
In a QUADStor cluster there are three kinds of nodes
Controller Node
A Controller Node is the most important node of in the cluster. There can only be one controller node per cluster. The controller node is in charge of VDisk metadata and block allocation policies. Without a controller node the whole cluster cannot perform
Client Node
Host systems can access VDisks configured on the controller node through either a client node or the controller node itself. As with the controller node the VDisks can be accessed by the iSCSI, FC and local interfaces of the client node
Master Node
A Client node can also perform as a master node. As with the controller node there can only be one master node in a cluster. The master node can take charge of VDisk metadata and block allocation policies from the controller node. The master node takes charge of this when either the controller node has failed or voluntarily relinquishes control.
IO request processing
When a client node receives a write command, the client node computes the hash fingerprints for the newly received data blocks and communicates this information to the controller node. The controller node validates the write request and based on the fingerprint information instructs the client which blocks are unique and where to write the unique blocks, which are duplicate etc.
Similarly when a client node receives a read command, the client node communicates the command information to controller. The controller node validates the read request and instructs the client node which data blocks to read and from where in order to satisfy the read request. Once the data blocks are read the response is sent to the host originating the read request
The following figure illustrates the IO request processing
High Availability
As mentioned before one of the client nodes in the cluster can also perform as a master node. The requirement for a master node is that it have access to all disks configured as physical storage on the controller node. The master node can serve client node request in the absence of the controller node. Once a controller node is back online the master node relinquishes its control.
Clustering Examples
In the following figure the host only can access the VDisk through the interfaces of the client node. The client node communicates with the controller node for command related decisions. If the controller node fails and the master node is available, the client switches over to the master node and continues.
In the following figure the host accesses the VDisks through the controller node and the master node. The host sees the controller node and master node as different paths to the same disk. Active-Passive and Active-Active configurations are both supported
If the controller node fails, the host continues IO to/from the master node