Mnesia vs Cassandra (vs CouchDB vs ...) - your thoughts?

Qqwy · June 7, 2018, 4:09pm

Looking at the stacks that existing large companies have used, WhatsApp internally uses Mnesia to store the messages, while Discord uses Cassandra.

Also, in the past we’ve had quite a long discussion about ‘When to use Mnesia’ (vs. a traditional relational database), but this was about a year ago, and maybe there have been changes to the Mnesia Ecto bindings or similar?

What I already know:

Cassandra runs on Java, Mnesia runs inside your BEAM.
Both are distributed databases.
Mnesia does not do conflict resolution (although you can write your own, and the packages unsplit and [reunion](https://github.com/snar/reunion exist to do this for you). Cassandra always uses last-write-wins for conflict resolution.

idi527 · June 7, 2018, 6:53pm

I think it’s moving to “ForgETS” or something like that which is their reimplementations of mnesia on top of ets.

I’d use scylladb for storing unimportant data like messages.

I particularly liked this talk about scylla https://www.youtube.com/watch?v=4BVdZw9uUMA

But I haven’t used it in production yet though.

cmkarlsson · June 7, 2018, 8:13pm

I don’t think anything in particular has changed over the year in regards to mnesia.

Things you need to know:

Although WhatsApp use mnesia they don’t use it in the “normal” way. They do async dirty transactions and they have their own replication between nodes (they have two nodes per partition). It is also used for temporary type of data.
Keep in mind, that unless you go with one of the new mnesia backends (such as leveldb) you must hold all your data in memory.
mnesia supports transactions (cassandra has light-weight transactions)
They are distributed but they do differ how this is done. Cassandra is more (and easier) scalable than mnesia.

Overall, if your data is more ephemeral in nature mnesia is a good choice, you need transactions. If you have large amount of data and the need to scale a lot cassandra would be better.

mkunikow · June 7, 2018, 8:56pm

You have big company behind Cassandra https://www.datastax.com/ with theirs tools, Apache Sorl integration (search engine)
Also Casandra has good integration with big data eco system like Apache Spark, Apache Kafka.

rvirding · June 9, 2018, 10:33pm

No, mnesia has always had the ability to store data on disk. You just specify on which nodes you want disc copies. The only earlier limitation has been that the disk files must not be larger than 2 Gb but you got around this by partitioning your tables.

cmkarlsson · June 9, 2018, 11:26pm

Yes, you are right. For small datasets it is doable. On the other hand if they are comparing it with cassandra I just assumed they had “a lot” of data.

In my experience the dets backend is not feasible. The small amounts of data it can keep and the overhead of managing table partitioning is not really worth it if you have large amounts of data. Just assume you have a dataset of 1TB, that in itself means 500 partitions and to cater for growth you want more than that. And because it performs best when you add partitions in the power of two the next step is 1024 partitions which you need to manage. The time it takes to change the number of partitions is also not very optimized. The process uses temporary ETS tables of bag or duplicated bag with an O(N) behaviour. If you have more than 10-20K keys it will take forever to add a single fragment. (actually I have not tested this with dets, only the disc_copies backend so if it differs I may be wrong here)

rvirding · June 11, 2018, 12:01pm

Yes, it very much depends on how much data you want manage.

Qqwy · June 11, 2018, 1:18pm

A third option I have came across is to use CouchDB, which also some large parties use. It’s also built on the BEAM, but unfortunately currently does not support running in the same BEAM as your app. The main difference between CouchDB, Mnesia and Cassandra are:

Cassandra

runs on the JVM (There is also a drop-in replacement called ‘Scylla’ that runs on C++).
It stores data in a relational format (tables with specified columns with specified types)
uses SQL(-like; some features are not supported because of the distributed nature) statements to query data.
Updates are per-column, so conflicts are also per-column (although lightweight transactions to ensure all-or-nothing updates are also supported).
For cassandra to work well, your server’s clocks need to be synchronized.

CouchDB

runs on the BEAM. (but including it inside your BEAM instance is not supported)
It stores data in a document format,
javascript map/reduce functions to query data.
conflicts are resolved by you yourself periodically fetching multiple revisions and storing a new conflict-free version; until you run this, last-write-wins.

Mnesia

Runs inside your BEAM.
It stores data in a set/ordered_set/bag of Elixir/Erlang datatypes.
Erlang QLC queries to perform queries.
Does not handle conflicts; although libraries such as unsplit and reunion allow you to do this by writing your own conflict-resolution function.

According to the Mnesia documentation, the 2GB limit only applies to disc_only_copies; ram_copies and disc_copies are only affected by the size of available RAM. For practical purposes this still means that >16GB (or maybe >32GB or >64GB, but then we definitely are at the limit) tables need to be partitioned.

To be honest, I still have no clue which database engine I’d like to pick. I might end up going for the one that restricts my system the least for the future because I see no clear disadvantages from picking one over the other.

tty · June 11, 2018, 1:43pm

I start with Mnesia because its just there. No extra installs needed, no mucking with another language (i.e. SQL) to interact with the dbase. Most of the systems I work on don’t do Updates on a row, only Creates so the whole conflict resolution typically don’t come into play.

For longer term storage, past a week in my cases, I would move the data from Mnesia to a RDMS or another type of database. Typically which ever flavor the Sales/Accounts/Finance people are keen on using.

OvermindDL1 · June 11, 2018, 8:25pm

Going with the last few posts, put it all behind a module interface, start with the most basic, which might just be ETS or so in RAM, maybe mnesia later, maybe move to eleveldb or something later, maybe even migrate to PostgreSQL later, all without changing the interface.

dch · June 14, 2018, 9:50am

I saw your request on the couchdb mailing list, and while it’s “not supported” it’s not very hard to do it either, as this is actually how CouchDB is built internally anyway.

These notes are for 1.7.x which is what I’ve got to hand:

create your elixir application as usual and compile it
unpack the release into couchdb plugins dir (or /usr/local/lib/couchdb/erlang/lib/... ) for whatever your OS uses
add a [daemons] section to your local.ini config with myapp={'Elixir.MyApp', start_link, []} which will ensure it gets started up as part of the main supervision tree of couch
set ERL_FLAGS environment variable to use -config /usr/local/etc/couchdb/sys.config to get your custom application settings loaded up
look through https://github.com/apache/couchdb/tree/master/src/couch_plugins which has some useful comments and source snippets

I’m reasonably sure this is complete, and while the complexities of 2.x series with cluster support make this moire complicated, the guts of that should remain the same.

At some point, somebody will decide to put a nice tidy shim in between the chttpd “clustered httpd” layer that exposes the soft erlang underbelly of couchdb, and then you’d have native erlang terms accessible to Elixir as well. There are a few not-so-handwavey details to deal with - no maps yet as couch uses proplists (no maps at the time in Erlang, and to support some quirks of JSON), things like structs have no way of being converted back to JSON at the http layer, and a bunch of valid BEAM types that would need to be cleaned when turning them into JSON again etc etc, once the shim is available.

The big issue is that releases for a clustered db are a serious thing, and we tend to update and restart our apps far more often than our DBs. If you’re already co-locating the DB and the app on the same server, only the JSON encoding & recoding is the bottleneck.

Qqwy · September 30, 2018, 9:16pm

So… we’re a lot further in development right now.

Currently, we’re using Mnesia, mostly because it allows us to postpone the choice for another datastore for later.
At first we thought Mnesia might be useful to use as a distributed datastore that would scale to millions of users all on its own, but:

The number of nodes in a database cluser is not 1:1 with the number of user-serving application nodes. Separating these into two layers would make the reason to let your database live ‘in’ your application no longer be worth it.
Messages in our chat application are broadcasted to all users connected to a chat channel (== a phoenix channel) in parallel to storing something to the datastore: In effect, the datastore is only used to read back in history when (re-)connecting to a channel at a later time. So: Read speed (the extra bit you get from having the database live inside your appiication) is not thát important here.
Mnesia does not do many of the things that are important for a large distributed datastore system on its own: It would mean building a lot of custom logic to handle the following features that you’d like in a distributed, fault-tolerant datastore:
network partitions resolving. (Mnesia suffers from split-brain, and performing read-time fixes is non-trivial)
proper replication/fault tolerance,
dataset sharding/consistent hashing etc. (Mnesia has limits to table sizes; table fragmentation is supported, but a very leaky abstraction when used directly; and you have to manually manage the fragments (and e.g. resizing them when new nodes are added)).
run-time network topology reshaping (mnesia is completely configurable at runtime, but does not have any strategies on its own).

So, together, these reasons are plentiful enough to keep our eyes open. There’s a high possibility that we’ll switch to either Riak or Cassandra or CouchDB. However:

Riak is cool, has great features and interacting with the Erlang-based system feels relatively natural from Elixir. However, Basho (that made Riak) is no longer functioning. What does/will this mean for the longevity of the database, and support etc?
Cassandra is supported by a large group of companies, and is e.g. what Discord currently uses in their Chat system (blog post). However, it’s based in Java, and its last-write-wins methodology of handling conflicts (which is non-optional, there is no way to manually resolve conflicts or e.g. build CRDTs) is frequently critiqued (including in Discord’s blog post).

I’ll look into CouchDB more before comparing it with these two again. Anyhow, in the meantime I’d like to hear if anyone has used these systems recently and has some more feedback: Almost all critiques you can find on the internet are multiple years old, and especially now that e.g. Basho is no longer a functioning company, the landscape might have changed somewhat.

tty · September 30, 2018, 10:20pm

Riak. Where support is concern: https://www.erlang-solutions.com/blog/riak-commercial-support-now-available-post-basho.html

Mladen · October 1, 2018, 9:05am

Hi there,

Great post, thanks for that. Just to say we here at Erlang Solutions have, together with the other prominent Riak users, taken on the effort of furthering Riak post-Basho. We also provide full Riak support to anyone needing it, and many Riak users have since signed up for the same. As you say, Riak is a great fit, being a natural part of the Beam ecosystem so worth considering in this situation. We will work to keep Riak moving forward and support should not be a concern as we can provide any help you might need in your adoption and use of the datastore.

Qqwy · October 4, 2018, 1:21pm

Hello @Mladen, thank you for your reply! That sounds great.

One thing we are of course still a bit scared about, is that Riak is still running on Erlang R16B2, whereas we’re now at Erlang 21. Is there still someone taking care of e.g. security fixes being applied to the Riak codebase?

dch · October 5, 2018, 10:32am

BTW you’re welcome to ask about CouchDB directly - contact me, or the couchdb user mailing list. There’s commercial support available, hosted solutions too, and a BEAM-friendly community as well on IRC. See the https://couchdb.org/ website for more info.

Qqwy · February 28, 2019, 1:28pm

Note: We are still busy researching the different options here; I am working hard to get Planga up to speed on one of the distributed databases.

So a quick update of my findings w.r.t. CouchDB, Cassandra and Riak in the meantime:

Cassandra does per-field Last-Write-Wins for conflict resolution.
- This requires server clocks to be synchronized (!)
- Also, it is completely unconfigurable.
- An example: If we have a user structure, and Alice changes user.email and user.phone whereas Bob changes user.phone, assuming Alice’s change happens earlier, the end result will have Alice’s email change and Bob’s phone change in there. (Regardless of if they have seen each-other’s changes in the meantime!)
CouchDB uses ‘revision hashes’: The revision with the longest history chain wins, with ties solved by taking the revision whose hash is lexicographically higher. The hashes are value-dependent, meaning that if two people perform the same change, there is no conflict.
- No clocks necessary, CouchDB just uses the observation order of every node separately.
- However, this means that by default the picked ‘winner’ is essentially random (and might differ between nodes?!), so you have to run your own active conflict-resolution logic.
- Usually this is done at read-time, by checking if there are conflicts for the resource we want to fetch right now.
Riak uses either plain get/put, or CRDTs. This means that, as long as you are able to shoehorn your data in the format Riak’s CRDTs use, conflict-resolution is automatic.
- Riak’s main disadvantage is that it is currently not maintained.

What is a bit unfortunate is that for all three of these systems, the existing database client libraries are all unfinished. It is very likely that we’ll need to write our own Ecto adapter (or, if this turns out to be infeasible, a custom DB wrapper).

Food for thought.

tangui · February 28, 2019, 3:10pm

Depends on what you call “unmaintained”. As far as I know Riak IP was acquired by bet365 last year and Riak has since been moving forward. A new 2.9 RC has been released last month: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2019-January/039316.html
You can see (some?) dev effort here: Commits · basho/riak · GitHub

I guess the sustainability of Riak might be questionable (seems to depend on the will of one company, few developers, etc.). However I came to the same conclusions as you (and partly thanks to these articles) that compared with other solutions it has the best guarantees for eventual consistency (at least for what I plan to do with it).

dch · March 1, 2019, 1:18pm

It might not fit for your use case, but https://hexdocs.pm/icouch is the most feature complete couch library out there. I don’t use ecto at all yet so I’ve not done any investigation into couch+ecto comparisons.

Riak is maintained now, and is pretty close to a 2.9 release https://github.com/basho/riak/blob/develop-2.9/doc/Release%202.9%20Series%20-%20Overview.md although I have no idea how you’re supposed to figure that out just by reading the main repo page.

wrt CouchDB’s revision handling yup you are right, however in practice with a front end proxy in front and a sweeper view to handle conflicts I don’t really have issues in practice. It’s possible to get conflicts “inside” a cluster if couchdb can’t meet the requested quorum level (which is more of an admin problem, but …) and accepts a write with 202, and then you manage to write again to the lost node. At least, your data from both writes isn’t lost, and your application can figure out how to merge that as needed.

There’s no particular reason you couldn’t request conflict info on every GET, but personally I prefer 2 other approaches which work for me:

have a conflicts view that triggers a background task every time it detects a conflict. I have had only a handful of these in years of running CouchDB
design applications in a conflict-free way such that each write creates a new document, and these can be merged back via a view to only return the most appropriate data

Conflicts are a key part of CouchDB’s mesh architecture if you have multiple clusters, and in practice the issue comes up infrequently.

Given the same N revision docs, couchdb will always pick the same winner (which may not be the one you want) consistently.

Also you can embed your Elixir app “inside” CouchDB as a further supervisor. This is really cool but probably of little practical use for most people.

finnillson · December 15, 2019, 2:26am

I would like to see Mnesia (or something to take its place) be updated to address peoples concerns with using it.

CouchDB is nice but the JSON documents interface is crap, imo.

BEAM could benefit with having a modern distributed DB embedded natively in it ready to go without the little quirks of Mnesia that make it a no go for a lot of use cases.

I realize this is a HUGE task.

Imagine a modern JIT’d BEAM with a blazing fast, native, distributed DB that uses the native Erlang terms that would scale and work right out of the box.