I want to create a simple application (But one that is not tightly coupled with Phoenix). One of the things I am considering, is whether it would be better to use Ecto (which right now does mean that an external database like Postgres or MySQL is necessary), or Mnesia:
I really like Ecto’s query language.
I really like not being dependent on external services; keeping everything inside OTP-land.
What are the considerations I should make? Why should I pick one over the other?
Mnesia is the somewhat odd name for the real-time, distributed database which comes with Erlang.
11.1 What is Mnesia good at?
Locking and Transactions
If you need to keep a database that will be used by multiple processes and/or nodes, using Mnesia means you don’t have to write your own access controls.
Distribution
Tables can be replicated at many nodes, both for efficiency (database lookup is a local activity) and robustness (redundancy means if one node goes down, other nodes still have copies of the data.)
Non-fully-normalised data
Unlike most database systems, records can contain data of arbitrary size and structure.
Monitoring
Monitoring - processes can subscribe to events which are sent when various operations on the data take place (update, delete, etc) The RDBMS package allows even more fine-grained control.
11.2 What is Mnesia not so good at?
Mnesia is primarily intended to be a memory-resident database. Some of its design tradeoffs reflect this.
Really large tables must be stored in a fragmented manner.
Then I think you can go with something like https://github.com/meh/amnesia. It’s querying syntax is pretty nice. But I would consider Ecto + Postgres an essential, well supported and documented dependency.
Ecto gives you some handy stuff, combined with the speed, reliability and robustness of PostgreSQL. In itself, it can drive architecture adoption if the benefits are large, and believe me, they are.
It is not ~that~ coupled with Phoenix. You can still access the database directly, via PSQL or another client connected directed to it or via the Postgrex driver that Phoenix uses under the hood. PostgreSQL also has a lot of languages that can be used inside of it, such as Python, TCL, JavaScript and C.
Mnesia, on the other hand, is shipped with Erlang and can be used without additional installation. That is also tough to beat. I have experience using the Bolt database with Go, and embedded databases have their advantages in deployment and, sometimes, speed.
And what do you think about using PostgreSQL as persistent storage and Mnesia as cache for dictionaries (which are not supposed to be changed in runtime)?
Or it will be too complex?
You have convinced me to use PostgreSQL with Ecto right now, because it is probably easier to start with and maintain, and having Mnesia’s distributiveness or the extra speed of an in-memory database is not of the most importance for this application (at least at this point in time).
What would be cool, though, is if there would be an Ecto<->Mnesia adapter at some point, because being completely independent from external services remains very compelling.
I also recommend just sticking with Postgres, at least for now.
What would be cool, though, is if there would be an Ecto<->Mnesia adapter at some point, because being completely independent from external services remains very compelling.
I can wholeheartedly recommend trying to write your own adapter as a weekend project or something - I learned a lot about Ecto’s internals (and appreciated it’s design) by doing so.
Shameless plug: you can look at my experiments at github_ecto or ets_ecto. A bare minimum adapter that does something could be this: lib/ets_ecto.ex (from an early commit in that project)
I think for It’s better for Erlang/Elixir community to use Mnesia. It has few limitations. I used PostgreSQL in very large and complex projects (Red9.ir) and it has it’s own limitations. Number of connections, parallel processing, slow indexes on very large tables, distribution problems and sometimes strange disconnections due to network problems are few to mention. The main reason not to use Mnesia in lack of good libraries for Elixir. Amnesia is cool but not complete and lacks good documentation.
You need to manually handle split-brain situations. There are a few well known tricks to do this with various trade-offs but at then end your application will have to handle it and decide what to do.
You need to keep all your data in memory. Well, you could use dets but then you have a 2GB restriction on table-size. Also there are new backends for mnesia which may solve this (level db)
You can use fragmented tables to handle above but your application must set this up and handle the fragments.
It is prone to get overloaded and handling it in distributed settings is harder than you think.
Slow start-up times (depending on order nodes are started mnesia may decide to copy entire tables to different nodes, this slows down startup time significantly)
Upgrading table definitions may be hard in a distributed setting and without stopping access to the database.
All this said. Mnesia is a very battle-tested database, it is just that it requires a fair bit of skill to operate and that you know its limitation. There are some pretty big shops running mnesia as a major part of their infrastructure (Whatsapp, Klarna).
Why would you want to use mnesia?:
You can’t get a faster database for reads as it runs in the same memory space as your erlang application. Anything else handles reads under 40 micro-seconds?
No impedance mismatch between erlang code and the database. You read terms straight of memory so no serialization layers and no database traffic
It has transactions. They are of course more limited than a proper database but not many other NoSQL databases have such good history
Distribution. Have its own problems of course but it is a distributed database
Mnesia isn’t a replacement for postgresql though, It is not an RDMS and as with any other NoSQL database you must have a use case where it fits.
I use PostgreSQL for anything and everything that needs to persist.
If postgresql is slow on certain queries that I cannot really make better than I either make a materialized view inside postgresql that updates on occasion, or if I do a ton of tiny little queries per transaction that all are the same (like getting permissions across connections, websockets, etc…) then I use CacheX (with a fallback in it set to querying the database) as it has auto-timeout and removal, janitor process, and far far more, including transactions and shared retrievals (it is built on ETS of course).