Mnesia vs Cassandra (vs CouchDB vs ...) - your thoughts?

Qqwy · February 28, 2019, 1:28pm

Note: We are still busy researching the different options here; I am working hard to get Planga up to speed on one of the distributed databases.

So a quick update of my findings w.r.t. CouchDB, Cassandra and Riak in the meantime:

Cassandra does per-field Last-Write-Wins for conflict resolution.
- This requires server clocks to be synchronized (!)
- Also, it is completely unconfigurable.
- An example: If we have a user structure, and Alice changes user.email and user.phone whereas Bob changes user.phone, assuming Alice’s change happens earlier, the end result will have Alice’s email change and Bob’s phone change in there. (Regardless of if they have seen each-other’s changes in the meantime!)
CouchDB uses ‘revision hashes’: The revision with the longest history chain wins, with ties solved by taking the revision whose hash is lexicographically higher. The hashes are value-dependent, meaning that if two people perform the same change, there is no conflict.
- No clocks necessary, CouchDB just uses the observation order of every node separately.
- However, this means that by default the picked ‘winner’ is essentially random (and might differ between nodes?!), so you have to run your own active conflict-resolution logic.
- Usually this is done at read-time, by checking if there are conflicts for the resource we want to fetch right now.
Riak uses either plain get/put, or CRDTs. This means that, as long as you are able to shoehorn your data in the format Riak’s CRDTs use, conflict-resolution is automatic.
- Riak’s main disadvantage is that it is currently not maintained.

What is a bit unfortunate is that for all three of these systems, the existing database client libraries are all unfinished. It is very likely that we’ll need to write our own Ecto adapter (or, if this turns out to be infeasible, a custom DB wrapper).

Food for thought.