Why isn’t mnesia the most preferred database for use in Elixir/Phoenix?

Thanks for the link to the very informative video.

I took a bunch of notes, some may be not totally correct, missing bits or not well understood by me, but anyway I will leave them here for others to review and point me where I am not getting it :wink:

MNESIA FOR THE CAPper

The good stuff

  • Runs in same memory space of Erlang, thus very fast access, not matched by other databases.
  • Stores data as Erlang terms.
  • The query language is Erlang list comprehensions.
  • If a crash occurs and leaves the filesystem with severe corrupted files, that
    Mnesia is not able to repair, then it will refuse to start. If in a cluster we can delete the files and restart Mnesia, nad it will go to the other n odes to grab the necessary files to start and populate back the data.
  • Mnesia transactions assume that the functions running inside the transaction don’t have side effects, aka they only work with the database API, thus no message passing to other processes or whatever. Also Mnesia dirty operations cannot be done inside a transaction, otherwise nasty surprises may arise.
  • For each transaction Mnesia creates a temporary ETS tables and writes to it.
  • Mnesia supports transactions inside transactions, but you can take a performance penalty due to all necessary copy of data between the ETS temporary tables it creates for each table.
  • Fragmentation of tables use linear hashing to distribute the data among them, but a callback exis ts to allow us to implement other type of hashing, like consistency hashing.
  • We can extend Mnesia functionality by using callback Modules, but care needs to be taken.
  • Using sticky locks to have data only in one node will eliminate the need for that node to have to communicate with other nodes, thus speeding up the operations. Regarding dead locks the author of the talk has the lock repo that is a scalable deadlock resolver.
  • Incremental backups module.
  • Install fall-back are useful to use in a system upgrade. For example to revert a database to a backup in case of any node fails to upgrade.
  • Mnesia does not have geographic redundancy, but once transaction logic is not time sensitive, thus can use slow networks, therefore you can use the fact that we can geographically put nodes wherever we want to implement one, provided that each node have a copy of the schema, then allowing for each node to receive a copy of any schema update. It’s wacky but possible.

The bad stuff

  • Using DETS it’s limited to 2GB and Mnesia will not tell that we are reaching or have exceed the limit, because it doesn’t tell how much memory is being used. Nowadays a better alternative exists, that is to not use DETS at all, thus instead of using disk_only_copies we may want to use disc_copies for persistence, that will use the more recent disk_log to write the data into disk.
  • No versioning of tables or metadata, therefore in a system upgrade that requires to change the schema definition and/or data shape we cannot use the strategy of upgrading a node at a time, because once a node updates it schema it will immediately propagate it to all other connected nodes in the Mnesia cluster.
  • Brain splitting or network partition. This happens when a network failure occurs between nodes while they are still up, thus they still accept writes, therefore when they get connected back they will have an inconsistent state and Mnesia will refuse to merge them, leaving to us developers that task. Any automatic method that we can devise to handle this automatically may incur in data loss.
    • A function exists to set what are the master nodes, thus allowing for Mnesia to pick one of them and discard the others, but this may also incur in some data loss, but at least the system will continue to work with a “consistent” database, based on the master node.
    • We can listen to the event for the brain split and hook into a function that will allow us to run our code to merge and solve the conflicts.
    • The vector lock implementation used by Riak can be added to the table metadata to be used for automatically try to resolve the merge of data in a brain split.
    • Tables can be locked while we are trying to solve and merge the conflicts.
    • The author of the talk have release the unsplit repo to deal with all this.
  • Mnesia overload can happen in two ways.
    • When we too many and fast writes that are replicated to other nodes, a node may be slower and start building a queue. It’s from probable to happen with dirty writes then with transactions. Either way Mnesia will report it’s overloaded, but it’s really very hard to detect it’s about to happen in order to prevent it from happening.
    • When disc copies are used Mnesia will create transaction commit logs and periodically flush them to the disk, and when they start to overlap(aka a new one is created before the other finishes to flush to disk) Mnesia will tell you it’s overlapped.
    • Mnesia was not telling us when is not any-more overloaded, thus not allowing us to build a load mechanism that would allow for back-off when overloaded and to resume to full speed when recovered. After release 14b it seems that will exist an API to allow to build a Load Framework, that the presenter of the talk is thinking in building and release. The closest I could find in his Github was a job scheduler for load regulation in this repo.
  • No safe replication with dirty writes.
  • No built in geographic redundancy.

Other Backend for Mnesia

He mentions something about looking at Bitcask as a possible interesting backend…

I think is talking about the Riak one, that we can find in this repo.

11 Likes