I’m looking for some sanity checking and hints/pointers for a project I’m creating.
The project is keeping track of stock for a webshop, so it needs high availability. For this reason I settled on using distributed mnesia, so I can have transactions across an mnesia cluster. What I’m currently trying to figure out is dynamically adding nodes to the cluster - I know this is possible directly in mnesia using change_config and I believe the SyncM package was created for this as well. What I’m currently unsure of is whether any node in the cluster can add new nodes, or if it’s only the starting node that can do it.
My worry is handling split-brain scenarios and specifically handling the vm running the mnesia origin node going down. If the origin node is down, can another node take on the responsibility or will the cluster just be set in stone at that point?
Welcome to the elixirforum!
Here are a few pointers with the note that I have not actively worked with mnesia clusters for a while, so I may remember things wrong.
Any node can add another node in the cluster. The cluster schema is also distributed. Split-brain scenarios must be handled manually. There are a couple of different strategies, mainly to just pick one side of the partition to be the correct one (using
majority option on the tables helps picking the right side), or using the
unsplit library. In terms of origin node going down you should be OK but one caveat to bear in mind (if I remember correctly) is if all the nodes in the cluster goes down, the cluster will not be able to start until the last living node is back up again. You can force start this, but risk losing data.
Also, mnesia was not designed to dynamically add and remove nodes and it is a bit clunky to do this. Especially removing nodes which are dead (the node must be alive to be removed and if it is truly dead you may have to start a new node with the same node name as the old one to remove it).
Regardless, mnesia is fun!
Thanks! That helps me on my way I’ll do some experimenting with this.
Basically I intend for mnesia to be a proxy for a postgres/mysql backend db - as long as I don’t have a split I just read/write to mnesia and write to the backend, but never reading from it.
In case of a split I can then figure out exactly how to handle things, but I’m basically thinking to go round mnesia in that case - either disabling the node if no db access, or just hitting the db directly. When the network heals I can then bring mnesia back up to speed again. Something along those lines - need to test the scenarios.