Hello everyone
I would like to know which database you prefer for chat applications?
I do not know if traditional databases are preferable or maybe there is some other option that works well with Elixir / Phoenix?
I would appreciate it if you can share your experience.
In this case it will be a fairly complete and large chat app.
postgres for storing user metadata (might replace with cockroach [1] for lolavailability, itās definitely production ready, hit 1.0 a few weeks ago),
elasticsearch for, well, searching through all those messages and users,
prometheus 2.0-alpha [2] for collecting and storing metrics (2.0-alpha because stable is way too stable)
Actually, even though cockroach is very nice, I might still stay with postgres because of postgis (might come in handy if you plan to support location sharing, āusers nearbyā, āboys from my hoodā etc.).
Note: Iāve never written anything more complex than a āhello world.ā So you should really just skip to the next answer.
How much traffic do you expect on it? I would keep it simple and stick to Postgres. It plays well with Ecto, widely used in the industry, better tooling, easy to setup, etc.
Eh, Iād honestly just use PostgreSQL for all that, it even has very advanced searching abilities that can out do elasticsearch in many cases, all while using the same database so no worry about things getting out of sync. Plus a chat app is mostly write and read with little to no āupdatingā, which is the perfect use-case for PostgreSQL (it flies the less āupdatingā you do, mostly just reading and writing).
Sure, itās just that master-master story in postgres is still far from perfect, although postgres can make it far without any need to scale/replicate, but then (in case the chat app is international) I would want to spin up a database server on each continent in an attempt to reduce latency, and that would make postgres a bit harder to operate.
PostgreSQL has some pretty great libraries now for such replication, two main styles too (one replicates the raw data, always perfect consistency, the other replicates commands, much less bandwidth).
Until I know, Whastapp dont save the messages in their database, just the accounts, the messages are peer to peer (Iām not sure, but its seems works like this)
No, they save messages in Mnesia as well. You can send instant messages to people who arenāt online. Once they sign in it will pop them from the db. If I remember correctly I think I saw a talk where they stated the average amount of time a message remained in the db was about 27 minutes.
I think thatās not enough
I would at least add Kafka, Spark, Hadoop, Nifi, NATS, K8s, and Terraform
I would also roll my own implementation of viewstamped replication
I obviously forgot a few things but for initial roll out this should be OK
We are getting very good results with Risk and its integrated SOLR search functions. There is a small delay while SOLR indexes update to the Risk database, but otherwise very scaleable with multiple writers and readers. It really depends on how complex your data model is, and how much traffic you expect. PG will be fine for anything under 1k simultaneous users for sure, and can go higher with effort.
Just a note about using mnesia in 201* https://news.ycombinator.com/item?id=10388488
It might have been an ok decision on whatsappās part in 2008, but now there are plenty of much better alternatives, I think. Myself, I would look at what discord guys are using, since they are also doing quite well.
Also, if you watch Rick Reedās talk from 2012 or 2014, it would seem that mnesia was quite a big pain point for them even back then. I honestly think some of their problems could have been avoided if they had used something like cassandra. Instagramās team used it well for their little chat.
I would think about the use cases you want to implement:
how import is the chat history (if not at all I would just keep it in genserver)
how long to you want to save the history
how do you work the historical data afterwards (searching, scrolling)
how big will the history get
how many users
how can you partition stuff (maybe there are groups that can be separated for easy charding)
When you know more about the use cases, the amount and the frequencies you can think about the technology.
If you want to get started quickly I would just use what you have, like postgresql and separate the storage from the core app. Then change to something more powerful when needed.