Best DB for chat

Hello everyone
I would like to know which database you prefer for chat applications?
I do not know if traditional databases are preferable or maybe there is some other option that works well with Elixir / Phoenix?

I would appreciate it if you can share your experience.
In this case it will be a fairly complete and large chat app.

Thanks

1 Like

I would try

  • scylla [0] for storing messages,
  • postgres for storing user metadata (might replace with cockroach [1] for lolavailability, itā€™s definitely production ready, hit 1.0 a few weeks ago),
  • elasticsearch for, well, searching through all those messages and users,
  • prometheus 2.0-alpha [2] for collecting and storing metrics (2.0-alpha because stable is way too stable)

[0] http://www.scylladb.com
[1] https://www.cockroachlabs.com
[2] https://prometheus.io

Actually, even though cockroach is very nice, I might still stay with postgres because of postgis (might come in handy if you plan to support location sharing, ā€œusers nearbyā€, ā€œboys from my hoodā€ etc.).

Note: Iā€™ve never written anything more complex than a ā€œhello world.ā€ So you should really just skip to the next answer.

1 Like

How much traffic do you expect on it? I would keep it simple and stick to Postgres. It plays well with Ecto, widely used in the industry, better tooling, easy to setup, etc.

Iā€™m sorry for answering your question with somewhat of a non-answer, but it has to be mentioned: A Little Architecture by ā€˜Uncle Bobā€™.

1 Like

Eh, Iā€™d honestly just use PostgreSQL for all that, it even has very advanced searching abilities that can out do elasticsearch in many cases, all while using the same database so no worry about things getting out of sync. Plus a chat app is mostly write and read with little to no ā€˜updatingā€™, which is the perfect use-case for PostgreSQL (it flies the less ā€˜updatingā€™ you do, mostly just reading and writing).

5 Likes

Any example for advances searchings with postgre? Iā€™m interested :o

One example: http://rachbelaid.com/postgres-full-text-search-is-good-enough/

2 Likes

Sure, itā€™s just that master-master story in postgres is still far from perfect, although postgres can make it far without any need to scale/replicate, but then (in case the chat app is international) I would want to spin up a database server on each continent in an attempt to reduce latency, and that would make postgres a bit harder to operate.

Also @rtj, you might want to read this https://brandur.org/cloud-databases.

PostgreSQL has some pretty great libraries now for such replication, two main styles too (one replicates the raw data, always perfect consistency, the other replicates commands, much less bandwidth).

1 Like

I feel this is a rare problem to have and once you do have it, youā€™ll have the resources to address it.

3 Likes

I feel like everyone here has forgotten that Whatsapp uses only Mnesia (or at least used too, Iā€™m not sure about anything after 2014).

2 Likes

Until I know, Whastapp dont save the messages in their database, just the accounts, the messages are peer to peer (Iā€™m not sure, but its seems works like this)

No, they save messages in Mnesia as well. You can send instant messages to people who arenā€™t online. Once they sign in it will pop them from the db. If I remember correctly I think I saw a talk where they stated the average amount of time a message remained in the db was about 27 minutes.

2 Likes

I think thatā€™s not enough
I would at least add Kafka, Spark, Hadoop, Nifi, NATS, K8s, and Terraform
I would also roll my own implementation of viewstamped replication :slight_smile:
I obviously forgot a few things but for initial roll out this should be OK

2 Likes

Dude PG is not web scale. For just 100K a month you can have a Spanner setup that will easily match a single mid size PG box in performance.

2 Likes

We are getting very good results with Risk and its integrated SOLR search functions. There is a small delay while SOLR indexes update to the Risk database, but otherwise very scaleable with multiple writers and readers. It really depends on how complex your data model is, and how much traffic you expect. PG will be fine for anything under 1k simultaneous users for sure, and can go higher with effort.

I am interested to this as well, is using GenServer and persists the data to postgres enough?

Just a note about using mnesia in 201* https://news.ycombinator.com/item?id=10388488
It might have been an ok decision on whatsappā€™s part in 2008, but now there are plenty of much better alternatives, I think. Myself, I would look at what discord guys are using, since they are also doing quite well.

Also, if you watch Rick Reedā€™s talk from 2012 or 2014, it would seem that mnesia was quite a big pain point for them even back then. I honestly think some of their problems could have been avoided if they had used something like cassandra. Instagramā€™s team used it well for their little chat.

I do like me some k8s and terraform from time to time ā€¦

I would think about the use cases you want to implement:

  • how import is the chat history (if not at all I would just keep it in genserver)
  • how long to you want to save the history
  • how do you work the historical data afterwards (searching, scrolling)
  • how big will the history get
  • how many users
  • how can you partition stuff (maybe there are groups that can be separated for easy charding)

When you know more about the use cases, the amount and the frequencies you can think about the technology.

If you want to get started quickly I would just use what you have, like postgresql and separate the storage from the core app. Then change to something more powerful when needed.

Marcus