Synchronous Mirroring

With synchronous mirroring, data sent to a VDisk is also mirrored to another VDisk on another QUADStor machine. Mirroring is synchronous which means that a write operation is complete only when both the VDisks acknowledge the write. Unlike traditional synchronous mirroring deployments, QUADStor's implementation is dedupe-aware. For example if a block is received by node 'A' and needs to be sent to node 'B', the data block needs to travel the network from 'A' to 'B' only if the block isn't a duplicate of an already existing block on 'B'.

Benefits of Synchronous Mirroring

1. Unlike shared storage high availability, the underlying physical storage is no longer a single point of failure

2. High Availability

Drawbacks of Synchronous Mirroring

1. Increased latency during write operations

2. Twice the storage requirements

Mirroring Setup

Synchronous mirroring isn't supported in a clustered environment and is supported only between two standalone QUADStor systems

For the purpose of understanding the following documentation, consider the following scenario

Node 'A' and Node 'B' are two machines between which synchronous mirroring needs to be setup.
Node 'A' has a VDisk 'M' for which synchronous mirroring needs to be setup.

Setting up Node 'A' and Node 'B'

On node 'A' and 'B' install the QUADStor Core and ITF packages

Only IPv4 addresses are supported. Host names and IPv6 addresses are not allowed

Create a file called /quadstor/etc/ndrecv.conf and add the following line

RecvAddr=<recv listening ipaddr>

For example

RecvAddr=10.0.13.6

In the above example, the system will bind to 10.0.13.6 for mirror data. This has to be done for both node 'A' and node 'B'. Node 'A' will bind to its RecvAddr and connect to Node 'B' RecvAddr and vice-versa.

Ensure that the following ports are allowed for TCP traffic in your firewall configuration

9953
9955

Restart the quadstor service

Ensure that physical storage is configured on both nodes

Configuring a VDisk for mirroring

1. Click on the "Virtual Disks" menu

2. For the VDisk click on its "Modify" link

3. Under the "Mirroring Configuration" section enter the destination VDisk name, destination pool for the VDisk, and the remote node's IP Address. If configuring on Node 'A', the remote node's IP Address will be the RecvAddr of Node 'B' and vice versa, if configuring on Node 'B'.

4. Click on Submit

Once successfully configured the current mirroring state for the VDisk is shown. From this point the two VDisks are attached for mirroring. If data is received by VDisk 'M' on Node 'A' it is also sent to VDisk 'M' on Node 'B' and vice versa

One of the VDisk assumes a Master role and other the Standby role. This is not to be confused with a Active-Passive mirroring as with our synchronous mirroring Active-Active configuration is supported and is described later. The roles used by the VDisk determines which VDisk has control over metadata related decisions. If the master VDisk fails and the slave VDisk cannot take over as Master, it no longer can write data to its node. However if the slave VDisk fails then the master VDisk can still write data to its node. Once the slave VDisk is back online, data is synced from the master VDisk to the slave VDisk.

Node fencing

For FreeBSD and SLES and fencing can be done using stonith (heartbeat package) and for RHEL/CentOS and Debian use fence-agents package

Installing Fence Agents

On RHEL/CentOS 6.x

yum install fence-agents

On Debian7.x

apt-get install fence-agents

For a list of possible fence agents for your hardware please refer https://access.redhat.com/site/articles/28603 Refer to the package documentation for configuring and manually fencing a node

In order for a slave VDisk to perform when a master VDisk has failed, the node containing the master VDisk needs to be fenced first. Configuration for fencing can be performed by qmirrorcheck tool by

/quadstor/bin/qmirrorcheck -t fence -r <mirror IPaddress> -v 'fence command'

-r is the RecvAddr of the peer node. For Node 'A', this will be the RecvAddr of Node 'B'

-v 'fence command' is the fence command to run. Note that the command should be specified with single or double quotes

Lets suppose VDisk 'M' has a slave role on Node 'B' and VDisk 'M' has the master role on node 'A'. Node 'A' has crashed and data needs to be written to VDisk 'M'. Node B detects that node 'A' is unresponsive n invokes the fence command which corresponds to the RecvAddr of Node 'A'. If the fence command is successful VDisk 'M' on node 'B' takes over the master role and write operations for VDisk 'M' continue uninterrupted. If the fence command fails, to avoid a split-brain condition write operations to VDisk 'M' on node 'B' will fail till VDisk 'M' on node 'A' is back online.

qmirrorcheck also supports a -t ignore -r <mirror IPaddress> option. This means on a peer failure, no fencing is performed. This however should be used when the nodes are a part of a cluster and the failed node will have any case been fenced.

Manual Switching of Roles

It is possible to now manually switch the role of a VDisk using the qsync program (/quadstor/bin/qsync) A few useful options that can be passed to the program are

/quadstor/bin/qsync -l will list all VDisk with synchronous mirroring configured. Also current role of the VDisk (whether master/slave), the current status etc are displayed

In order to switch the role of a VDisk run the following command

/quadstor/bin/qsync -s <VDisk Name> -m <Mirror Role>

Where:

VDisk Name is the name of the VDisk for which the role needs to be changed

Mirror Role is the new role of the VDisk. It can either be "master" or "slave" (without the quotes)

For example

/quadstor/bin/qsync -s FOO -m master

In the above example the role of VDisk FOO will switch over to master if the command is successful.

In order to switch the roles of all VDisks run the qsync command without a -s option and -f option

For example to switch all VDisks to the master role

/quadstor/bin/qsync -f -m master

Manual Failover

The qsync option is useful in implementing manual failover. In order to configure manual failover do the following on both the nodes

Clear any previous qmirrorcheck rules for the same destination ipaddress by

/quadstor/bin/qmirrorcheck -x -r <mirror ipaddress>

Where <mirror ipaddress> is the ip address of the peer.

Run qmirrorcheck with a -t manual option on both nodes.

/quadstor/bin/qmirrorcheck -a -t manual -r <mirror ipaddress>

Where <mirror ipaddress> is the ip address of the peer.

For example if mirroring is to be configured between 10.0.13.3 and 10.0.13.4,

On 10.0.13.3 the commands to be run are

/quadstor/bin/qmirrorcheck -x -r 10.0.13.4
/quadstor/bin/qmirrorcheck -a -t manual -r 10.0.13.4

On 10.0.13.4 the commands to be run are

/quadstor/bin/qmirrorcheck -x -r 10.0.13.3
/quadstor/bin/qmirrorcheck -a -t manual -r 10.0.13.3

Lets us suppose that VDisk FOO has been configured for mirroring between 10.0.13.3 and 10.0.13.4 and the current role of FOO on 10.0.13.3 is master and slave on 10.0.13.4

If node 10.0.13.4 fails and 10.0.13.3 then receives a write for FOO on 10.0.13.3, since -t manual has been configured and since FOO is master on 10.0.13.3, the write will complete successfully.

However if node 10.0.13.3 fails and 10.0.13.4 then receives a write for FOO on 10.0.13.4, the write will terminated and an error response is sent back to the client since the current role of FOO on 10.0.13.4 is slave. In order that 10.0.13.4 can continue writes to FOO qsync command can then be run to manually switch the role of FOO on 10.0.13.4 to master

In the case of manual failover care must be taken to ensure data integrity. For example using the same example as above Client writes to FOO on 10.0.13.3 which currently has a master role. The writes succeed, however there was a loss of connectivity between 10.0.13.3 and 10.0.13.4. 10.0.13.3 assumes that 10.0.13.4 has failed and continues with the writes since FOO on 10.0.13.3 has a master role. Now 10.0.13.3 fails and the client tries to continue the writes with 10.0.13.4. However switch over 10.0.13.4 to master at this point can lead to data corruptions since 10.0.13.4 has no idea of certain writes sent to 10.0.13.3

This can be avoided if in fact fencing has not been configured at all as described below. However that would also mean that if FOO on 10.0.13.3 is master and if 10.0.13.4 fails all writes to 10.0.13.3 from there on till 10.0.13.4 is back online, will fail

Fencing Failures

In order to continue writing data to a VDisk, in the event of a peer failure fencing must succeed. Newer versions of the software will refuse to complete a write if fencing of of a non-responsive peer fails or if fencing has not been configured . Following is a summary of how fencing failures are handle

1. VDisk with slave role receives a write command, peer node may have failed and command to fence the peer node has not been configured

Write command will be terminated and error response sent to client

2. VDisk with slave role receives a write command, peer node may have failed and

'qmirrorcheck -t ignore' specified for the peer address

Write command will succeed, VDisk takes over master role

3. VDisk with slave role receives a write command, peer node may have failed and 'qmirrorcheck -t manual' specified for the peer

Write commands will be terminated until role is manually switched to master or when peer is back online

4. VDisk with master role receives a write command, peer node may have failed and command to fence the peer node has not been configured

Write command will be terminated, VDisk switches over to Slave role and will require manually switching over to master role

5. VDisk with master role receives a write command, peer node may have failed and 'qmirrorcheck -t ignore' specified for the peer address

Write command will succeed, VDisk continues with master role

6. VDisk with master role receives a write command, peer node may have failed and 'qmirrorcheck -t manual' specified for the peer address

Write command will succeed, VDisk continues with master role

Active-Active configuration

With synchronous mirroring it would seem intuitive to write to VDisk 'M' using both paths to Node 'A' and Node 'B'. If Node 'A' fails, the client continues to write to Node 'B' and vice-versa. This is an Active-Active configuration.

In order to use a VDisk in an active-active configuration the following are the requirements

1. The clients must use SCSI SPC-3 Persistent Reservations or SCSI COMPARE AND WRITE (Atomic Test Set) commands. Legacy SCSI RESERVE/RELEASE commands are not supported

2. Fencing must be configured on both the nodes

For a simple example visit http://www.quadstor.com/tech-library/141-high-availability-with-synchronous-mirroring-tutorial.html