Why isn’t mnesia the most preferred database for use in Elixir/Phoenix?

jay1 · September 23, 2018, 1:25pm

Why is it that the mnesia database isn’t the most preferred database for use in Elixir/Phoenix?

kokolegorille · September 23, 2018, 2:07pm

Probably because most people using Elixir are not coming from the Erlang world

Anyway there is ecto support for Mnesia.

But I would not compare Mnesia with PostgreSQL, or MySql because they are not done for the same purpose.

IRLeif · September 23, 2018, 8:17pm

This thought pops up in my mind occasionally as well.

cmkarlsson · September 23, 2018, 10:36pm

The short answer is: because it is not a traditional relational database. Postgres, MySQL or other RDBMS are preferred if you need persistence for general purpose systems.

The main use cases I see (and use) for mnesia:

Configuration data. If I remember correctly this was the initial use case for mnesia
As a (distributed) caching layer or for other ephemeral data (instead of redis/memcache etc)
If your data structure fits nicely with the key/value approach and you don’t need to scale to a massive number of nodes.

I think mnesia is cool and should be used more when the use case fits but it does require you to acquire some specialist knowledge

ConnorRigby · September 24, 2018, 1:13am

I think mnesia has it’s uses, but storing very relational data (as web servers usually do) is not one of it’s main uses.

I personally just started migrating a personal project from mnesia to Ecto/Sqlite because of this.

There’s also nit pickier things about it.

The API is :erlang syntax, while not a deal breaker in itself, it is super annoying.
so for example with Elixir modules, usually functions look like:

def my_fun(data, action)

so you could do stuff like:

Map.new()
|> Map.put(:hello, :world)
|> my_fun({:some_action, %{some: :params}})

Erlang modules seem to have the arguments backwards (in comparison) so you would need to do

map = Map.new() |> Map.put(:hello, :world)
my_fun({:some_action, %{some: :params}}, map)

The other annoying thing is that :mnesia works with records, which Elixir supports, but working with them
can be a pain.

a record requires a module to look like:

defmodule My.Data do
   import Record
   defrecord __MODULE__, [:a,:b,:c]

   # and assuming you want to use your new record with Elixir functions:
   defstruct [:a, :b, :c]

   def to_record(%__MODULE__{a: a, b: b, c: c}), do: {__MODULE__, a, b, c}
   def to_struct({__MODULE__, a, b, c}), do: %__MODULE__{a: a, b: b, c: c}
end

To me this add just enough work/annoyance to go with other solutions most of the time.

jordiee · September 24, 2018, 4:25am

Honestly the reason I moved away from it is it’s an absolute pain to scale horizontally in a dynamic way(auto scale). And besides that mnesia is still a nosql db and that often times is not an acceptable choice.

keathley · September 24, 2018, 2:08pm

The main reason I tell people to avoid mnesia is that most people coming to elixir aren’t ready to handle mnesia’s lack of consistency. Most people who use mnesia either handle this limitation on their own or just accept that at some point they’re going to lose data. Mnesia has some other technical limitations once you put a decent amount of data in it. These are typically things like slow startup times (since it needs to read everything back into memory) and table limits (iirc certain table types have a 2gig max table size). But consistency is the big reason.

EDIT:

This is a good read on the subject: https://medium.com/@jlouis666/mnesia-and-cap-d2673a92850

OvermindDL1 · September 24, 2018, 4:22pm

“Most” languages that have pipes (all of them that I’ve used anyway except for Elixir) pipes into the ‘end’, not the beginning, and erlang does have a pipe parse transform, sooo… If anything, elixir is backwards. ^.^

However you can make a trivial flip macro to flip the beginning arg to the end to put in a pipe.

But still, with mnesia is that it was really designed to be a distributed ‘settings’ and state store, not something for holding large amounts of data, and certainly not relational data. It has its use-cases, but the traditional web work is not it.

PragTob · September 24, 2018, 7:04pm

I just wanted to thank everyone here for their opinion. I’ve often wondered why mnesia isn’t as used and this helped me understand why it’s (rightyfully so).

So my understanding is now as main points:

not designed to hold large amounts of data
Key/Value instead of relational data
lack of consistency
auto scaling

rvirding · September 24, 2018, 8:33pm

I don’t understand what you mean here with “lack of consistency”. If you use mnesia’s transactions then you are guaranteed that when the transaction has completed then all mnesia nodes are consistent. It is only if you use the “dirty” API you don’t get this guaranteed consistency.

cmkarlsson · September 24, 2018, 9:08pm

I think he might be talking about CAP theorem consistency. If you have multiple nodes in mnesia netsplits must be dealt with in the application layer. There are various ways of doing it (majority nodes, and https://github.com/uwiger/unsplit are the ones that pop into my mind). But the basic case to solve a netsplit is to pick a node and restart the others. If you picked the wrong one (or even the right one) you will lose data.

In addition there is no actual guarantees an mnesia transaction actually persists to disk. The mnesia cluster does a two phase commit so it knows that all the nodes have received it but each node does not force a disc sync. I think RabbitMQ even modified mnesia (or added a function or something) to make the transactions sync to disk.

Exadra37 · April 27, 2020, 7:55am

Where can I find more info regarding this?

cmkarlsson · April 27, 2020, 9:21pm

This is something I have read in erlang questions mailing list and potentially rabbitmq issue and support trackers.

I think this is the relevant thread: http://erlang.2086793.n4.nabble.com/mnesia-sync-transactions-not-fsynced-td4673313.html

Exadra37 · April 28, 2020, 7:52am

So from reading the link you provided it seems we can conclude that Mnesia cannot guarantee that a transaction is really persisted on disk, therefore data loss may occur. This one I was not expecting…

cmkarlsson · April 28, 2020, 9:32pm

Looking through the code it seems like mnesia uses disk_log as a transaction log and each commit in the transactions is written to the transaction log using disk_log:blog or disk_log:balog depending if it is synchronous or asynchronous. Neither fsyncs to disk. There are some fsyncs done by calling disk_log:sync which I believe fsyncs to disk.

Yes, if you have a power outage you may risk losing data. Note that operations like cp, mv and rm do not call fsync either so you are at risk of losing data from any of those operations as well.

fsync has performance implications and is hard to get right. For example Postgres had some problems that could lead to data loss (https://lwn.net/Articles/752063/) not too long ago and I know we have deployments where we’ve had to change the fsync per commit to fsync per second (in MySQL) because of too high load and accept the fact that we can lose a seconds worth of data.

MongoDB doesn’t fsync every commit either if I remember correctly, but then again it is not exactly a poster-child for durable transactions.

In the end for many use-cases it is an OK trade-off to go without, especially if you are running on enterprise hardware with battery backup on disks and with multiple nodes.

Exadra37 · April 28, 2020, 10:22pm

Everything in software developments is about trade-offs, and that’s what I am trying to understand about Mnesia

So it seems the most dangerous will be indeed the netsplits, and just then fsync one.

Do you know of any other edge cases that Mnesia may present?

cmkarlsson · April 28, 2020, 10:49pm

netsplits are the most annoying one. I personally haven’t had any problems with missing fsync.

Some other things that can cause problems if you don’t know about them.

If the entire cluster shuts down, it requires the last node to shut-down to be started for the rest of the cluster to start again. If that node is lost or not recoverable you need to take action.
When a node starts up it must copy all its data from another node. For large tables this can take some time. It would have been nicer if it could compare the tables and only copy what is needed.
Changing the number of fragments of large tables is practically not feasible. There is some O^n code in there which means it takes for ever when the data is re-distributed among fragments. Pick the number of fragments from the start and stick to it or you would have to backup, re-create tables, and restore the data with the new fragment count.
Overload must be prevented. mnesia can handle more writes than the underlying disks meaning that eventually things will be bad, and process queues will increase and grind the system to a halt.
Dynamic node adding/deletion is doable but cumbersome. mnesia is easier to use if you have fewer, more static clusters.
Schema upgrades are a known “annoyance”. I.e if you must change the record definition of the underlying table you must have a way of handling it. For example by traversing the tables and updating all the records or by handling multiple versions of the record in your application.

There is a good presentation by Ulf Wiger which explains some of these things. Mnesia for the CAPper. (https://vimeo.com/17162381). It is a bit old (2010) but still informative.

There are also some talks from WhatsApp on limitations and optimizations in mnesia, the beam and freebsd which are quite interesting but I don’t know which they are.

You might also want to search the erlang-questions mailing lists for mnesia as there are some scattered knowledge around there for other edge cases.

acrolink · April 28, 2020, 11:43pm

Mnesia? Never heard of it.

Dusty · April 29, 2020, 3:28am

The challenge seems to be that whatever you start with seems more intuitive. My love of Elixir clouds my judgement here, so pipe first just feels right. TC39 is having an epic discussion about how to handle pipe order. This is just one of many threads.

Exadra37 · April 29, 2020, 9:06pm

Thanks for the link to the very informative video.

I took a bunch of notes, some may be not totally correct, missing bits or not well understood by me, but anyway I will leave them here for others to review and point me where I am not getting it

MNESIA FOR THE CAPper

The good stuff

Runs in same memory space of Erlang, thus very fast access, not matched by other databases.
Stores data as Erlang terms.
The query language is Erlang list comprehensions.
If a crash occurs and leaves the filesystem with severe corrupted files, that
Mnesia is not able to repair, then it will refuse to start. If in a cluster we can delete the files and restart Mnesia, nad it will go to the other n odes to grab the necessary files to start and populate back the data.
Mnesia transactions assume that the functions running inside the transaction don’t have side effects, aka they only work with the database API, thus no message passing to other processes or whatever. Also Mnesia dirty operations cannot be done inside a transaction, otherwise nasty surprises may arise.
For each transaction Mnesia creates a temporary ETS tables and writes to it.
Mnesia supports transactions inside transactions, but you can take a performance penalty due to all necessary copy of data between the ETS temporary tables it creates for each table.
Fragmentation of tables use linear hashing to distribute the data among them, but a callback exis ts to allow us to implement other type of hashing, like consistency hashing.
We can extend Mnesia functionality by using callback Modules, but care needs to be taken.
Using sticky locks to have data only in one node will eliminate the need for that node to have to communicate with other nodes, thus speeding up the operations. Regarding dead locks the author of the talk has the lock repo that is a scalable deadlock resolver.
Incremental backups module.
Install fall-back are useful to use in a system upgrade. For example to revert a database to a backup in case of any node fails to upgrade.
Mnesia does not have geographic redundancy, but once transaction logic is not time sensitive, thus can use slow networks, therefore you can use the fact that we can geographically put nodes wherever we want to implement one, provided that each node have a copy of the schema, then allowing for each node to receive a copy of any schema update. It’s wacky but possible.

The bad stuff

Using DETS it’s limited to 2GB and Mnesia will not tell that we are reaching or have exceed the limit, because it doesn’t tell how much memory is being used. Nowadays a better alternative exists, that is to not use DETS at all, thus instead of using disk_only_copies we may want to use disc_copies for persistence, that will use the more recent disk_log to write the data into disk.
No versioning of tables or metadata, therefore in a system upgrade that requires to change the schema definition and/or data shape we cannot use the strategy of upgrading a node at a time, because once a node updates it schema it will immediately propagate it to all other connected nodes in the Mnesia cluster.
Brain splitting or network partition. This happens when a network failure occurs between nodes while they are still up, thus they still accept writes, therefore when they get connected back they will have an inconsistent state and Mnesia will refuse to merge them, leaving to us developers that task. Any automatic method that we can devise to handle this automatically may incur in data loss.
- A function exists to set what are the master nodes, thus allowing for Mnesia to pick one of them and discard the others, but this may also incur in some data loss, but at least the system will continue to work with a “consistent” database, based on the master node.
- We can listen to the event for the brain split and hook into a function that will allow us to run our code to merge and solve the conflicts.
- The vector lock implementation used by Riak can be added to the table metadata to be used for automatically try to resolve the merge of data in a brain split.
- Tables can be locked while we are trying to solve and merge the conflicts.
- The author of the talk have release the unsplit repo to deal with all this.
Mnesia overload can happen in two ways.
- When we too many and fast writes that are replicated to other nodes, a node may be slower and start building a queue. It’s from probable to happen with dirty writes then with transactions. Either way Mnesia will report it’s overloaded, but it’s really very hard to detect it’s about to happen in order to prevent it from happening.
- When disc copies are used Mnesia will create transaction commit logs and periodically flush them to the disk, and when they start to overlap(aka a new one is created before the other finishes to flush to disk) Mnesia will tell you it’s overlapped.
- Mnesia was not telling us when is not any-more overloaded, thus not allowing us to build a load mechanism that would allow for back-off when overloaded and to resume to full speed when recovered. After release 14b it seems that will exist an API to allow to build a Load Framework, that the presenter of the talk is thinking in building and release. The closest I could find in his Github was a job scheduler for load regulation in this repo.
No safe replication with dirty writes.
No built in geographic redundancy.

Other Backend for Mnesia

He mentions something about looking at Bitcask as a possible interesting backend…

I think is talking about the Riak one, that we can find in this repo.