QUADStor clusters have the advantage of distributing load across multiple nodes and high availability
A QUADStor Cluster consists of the following nodes
1. Controller Node
2. Client Node
For a controller node the following packages are installed
quadstor-core
quadstor-itf
For a client node the following packages are installed
quadstor-client
quadstor-itf
The procedure for installing the packages is as listed in http://www.quadstor.com/support/123-installation-on-rhel-centos-sles-debian.html and http://www.quadstor.com/support/61-installation-on-freebsd-8-2.html
However on the client node instead of installing the quadstor-core package, install the quadstor-client package
On the controller node ensure that the following ports are allowed for TCP traffic in your firewall configuration
9950
9951
9952
9954
9956
On the controller node create a file /quadstor/etc/ndcontroller.conf and add the following lines
Controller=<Controller IP Address>
Node=<Node IP Address>
In the above, Controller IP Address is the IP Address on which the controller binds. The Node IP Address is the client's IP Address. For each client in the cluster there will be a Node=<Node IP Address> For example
Controller=10.0.13.4
Node=10.0.13.5
Node=10.0.13.6
In the above example, the controller will bind to 10.0.13.4 for cluster traffic. 10.0.13.5 and 10.0.13.6 are the client which are allowed in the cluster. It is a good idea to build the cluster network as private network so that the cluster traffic does not interfere with the data path of the VDisk clients
On the client node create a file /quadstor/etc/ndclient.conf and add the following lines
Controller=<Controller IP Address>
Node=<Node IP Address>
For example for 10.0.13.5 the ndclient.conf contents would be
Controller=10.0.13.4
Node=10.0.13.5
Similarly for node 10.0.13.6 the ndclient.conf contents would be
Controller=10.0.13.4
Node=10.0.13.6
In the above example 10.0.13.4, 10.0.13.5 and 10.0.13.6 node form a cluster. Only on the controller node can configuration such as adding physical storage, adding VDisks etc can be performed. Configuration changes are automatically propogated to the client nodes. VDisks are accessible from any of the controller or client nodes
Any changes to the ndclient.conf or ndcontroller.conf will require a restart of the quadstor service on that node. The only exception to that is the addtion/deletion of "Node=..." lines in ndcontroller.conf
Shared storage access
In order to read/write to a VDisk, a client node needs to have access to the physical disks configured on the controller node. physical disks are identified by their serial number and/or SCSI Device identifiers. However for a client node it is not mandatory to have access to all/any of the physical disks configured. If a disk to which data is to be written/read from is not accessible by the client node, the data is read/written through the controller node. This however would leads to a drop in IO performance.
High Availability
In a cluster configuration, VDisks available as long as the controller node is active. If the controller node is inaccessible by any of the client nodes, the entire cluster is unavailable.
High availability is achieved by configuring any one of the client nodes as a master node. To configure a client node as a master node add the following line to /quadstor/etc/ndclient.conf
Type=Master
For example to make 10.0.13.5 as a master node in our example the contents of ndclient.conf will be
Controller=10.0.13.4
Node=10.0.13.5
Type=Master
Fence=/usr/sbin/fence_apc --ssh -l ...
Master Node Requirements
- For a client node to perform as a master the node must have access to all the physical disks configured (or will be configured) on the controller node.
- The available memory (RAM) of the master must be equal or greater than the available memory of the controller node
Once a master node configured, metadata state is synced between the controller node and the master node. The metadata traffic can be limited to a private network between the controller node and the master node. This is achieved by adding the following lines to ndclient.conf and ndcontroller.conf
HABind=<Bind IP Address>
HAPeer=<Peer IP Address>
For example the contents of /quadstor/etc/ndcontroller.conf could be
Controller=10.0.13.4
Node=10.0.13.5
Node=10.0.13.6
HABind=192.168.1.2
HAPeer=192.168.1.3
Fence=/usr/sbin/fence_apc --ssh -l ...
And the contents of of /quadstor/etc/ndclient.conf could be
Controller=10.0.13.4
Node=10.0.13.5
Type=Master
HABind=192.168.1.3
HAPeer=192.168.1.2
Fence=/usr/sbin/fence_apc --ssh -l ...
In the above example the controller would bind to 192.16.1.2 and sync metadata state to and from 192.168.1.3 and similarly the master would bind to 192.168.1.3 and sync metadata state to and from 192.168.1.2
NOTE (HABind and HAPeer are optional. If they are missing the Controller and Node values are used instead)
Once a master has been setup and metadata state has been sync (Initial metadata state sync up time is between 1 and 5 minutes), if the controller goes down the other client nodes can still continue to read/write to VDisks as long as the master node is up.
Once the controller node is back online, it would sync metadata state back from the master node and take over as the cluster owner
Node Fencing
Installing Fence Agents
On RHEL/CentOS 6.xyum install fence-agentsOn Debian7.x
apt-get install fence-agentsFor a list of possible fence agents for your hardware please refer https://access.redhat.com/site/articles/28603
In order to fence the controller during a takeover add the following to ndclient.conf
Fence=<fence cmd>
for example
Fence=/usr/sbin/fence_apc --ssh -l userid -p password --plug=1 ...
Note that everything after Fence= is considered as the fence command to execute. It is a good idea to test the fence command manually on the command line
Similarly add a command to fence the client on controller startup. On controller startup if the client is not reachable, it needs to be fenced before the controller can resume.
With fencing configured ownership decisions are easier and take less time. After adding the fence command ensure that the quadstor services are restarted.
Clustering status of Controller, Master and Client Nodes
A new utility 'ndconfig' is available to know the current status of a cluster node. In order to use the tool run the following command as root
/quadstor/bin/ndconfig
The following is an example output when ndconfig is run on a controller node
[root@quadstor]# /quadstor/bin/ndconfig
Node Type: Controller
Controller: 10.0.13.7
HA Peer: 10.0.13.6
HA Bind: 10.0.13.7
Node Status: Controller Inited
Node Role: Master
Sync Status: Sync Done
Nodes: 10.0.13.6 10.0.13.5 10.0.13.4
Node Type: Mirror Recv
Recv Address: 10.0.13.7
Node Status: Recv Inited
The following is an example output when ndconfig is run on a client node
[root@quadstor]# /quadstor/bin/ndconfig
Node Type: Client
Controller: 10.0.13.4
Node: 10.0.13.7
Node Status: Client Inited
In the above output "Node Role" indicates the current role of the node. Node Role can be Master, Standby or Unknown. If role is standby, the node will take over as Master when the peer node is down
Sync Status is important for correct HA operation. If Sync Status is 'Sync Done' High Availability of the cluster is possible. Other statues are 'Sync Error', 'Sync InProgress'.
HA Limitations
In order to effectively perform a switch over from a Master node to a Standby node both node should have a Sync Status of 'Sync Done'. There are certain conditions which can prevent from attaining this status.
1. The status is still 'Sync InProgress' when the node master crashed or restarted.
The solution to this is first start/restart the quadstor service on the controller node and then start/restart quadstor service on the client node
2. The status is 'Sync Error' and the node master is the client node and not the controller.
The solution to this is to start/restart the quadstor service on the controller node. If the problem still persists, then stop the quadstor service on the client node, start the service on the controller and then the client node.
3. The status is 'Sync Error' and the node master is the controller node.
Usually the client node will try to restart the sync process, however if the 'Sync Error' state persists then restarting the quadstor service will fix this problem