KVM+DRBD replicated between two active-passive servers with manual switching [Resolved]

I need to build 2-node cluster(-like?) solution in active-passive mode, that is, one server is active while the other is passive (standby) that continuously gets the data replicated from active. KVM-based virtual machines would be running on active node.

In case of the active node being unavailable for any reason I would like to manually switch to the second node (becoming active and the other passive).

I've seen this tutorial:!Cluster_Tutorial_2#Technologies_We_Will_Use

However, I'm not brave enough to trust fully automatic failover and build something that complex and trust it to operate correctly. Too much risk of split-brain situation, complexity failing somehow, data corruption, etc, while my maximum downtime requirement is not so severe as to require immediate automatic failover.

I'm having trouble finding information on how to build this kind of configuration. If you have done this, please share the info / HOWTO in an answer.

Or maybe it is possible to build highly reliable automatic failover with Linux nodes? The trouble with Linux high-availability is that there seems to have been a surge of interest in the concept like 8 years ago and many tutorials are quite old by now. This suggests that there may have been substantial problems with HA in practice and some/many sysadmins simply dropped it.

If that is possible, please share the info how to build it and your experiences with clusters running in production.

Question Credit: LetMeSOThat4U
Question Reference
Asked October 10, 2018
Posted Under: Network
3 Answers

I have a very similar installation with the setup you described: a KVM server with a stanby replica via DRBD active/passive. To have a system as simple as possible (and to avoid any automatic split-brain, ie: due to my customer messing with the cluster network), I also ditched automatic cluster failover.

The system is 5+ years old and never gave me any problem. My volume setup is the following:

  • a dedicated RAID volume for VM storage;
  • a small overlay volume containing QEMU/KVM config files;
  • bigger volumes for virtual disks;
  • a DRBD resources managing the entire dedicated array block device.

I wrote some shell scripts to help me in case of failover. You can found them here

Please note that the system was architected for maximum performance, even at the expense of features as fast snapshots and file-based (rather than volume-based) virtual disks.

Rebuilding a similar, active/passive setup now, I would heavily lean toward using ZFS and continuous async replication via send/recv. It is not real-time, block based replication, but it is more than sufficient for 90%+ case. I tested such a setup + automatic pacemaker switch in my lab with great satisfaction, in fact.

If realtime replication is really needed, I would use DRBD on top of a ZVOL + XFS. If using 3rdy part modules (as ZoL is) is not possible, I would use a DRBD resources on top of a lvmthin volume + XFS.

credit: shodanshok
Answered October 10, 2018

You can totally setup DRBD and use it in a purely manual fashion. The process should not be complex at all. You would simply do what a Pacemaker or Rgmanager cluster does, but by hand. Essentially:

  • Stop the VM on the active node
  • Demote DRBD on the active node
  • Promote DRBD on the peer node
  • Start the VM on the peer node

Naturally, this will require that both nodes have the proper packages installed, and the VM's configurations and definition exist on both nodes.

I can assure that the Linux HA stack (corosync and pacemaker) are still actively developed and supported. Many guides are old, the software has been around for 10 years. When done properly, there are no major problems or issues. It is not abandoned, but it is no longer "new and exciting".

credit: Dok
Answered October 10, 2018

I'm currently up to an extremely similar system. 2 servers, one active, one backup and they both have a few VMs running inside them. Database is being replicated and the fileservers are in constant sync with rsync (but only one way). In case of emergency, the secondary server is being served. There was the idea of using Pacemaker and Corosync but since this has to be 100%, I didn't have the courage to experiment. My idea is NginX watching over the servers. This could be done because I'm using a webapplication, but in your case, I don't know if you could use it. DRBD is a mess for me. The previous servers were using it and while it seemingly worked, it felt like I'm trying to dissect a human body.

Check this out, it might help you:

It doesn't look hard, in fact, in a small environment I've already tried it and worked. Easy to learn, easy to make, easy to maintain. Actually I think this is what you are looking for.

credit: Bert
Answered October 10, 2018
