Ram, an in-memory distributed KV store

ostinelli · December 21, 2021, 2:02pm

Let’s write a database! Well not really, but I think it’s a little sad that there doesn’t seem to be a simple in-memory distributed KV database in the beam. Many times all I need is a consistent distributed ETS table.

The two main ones I normally consider are:

Riak which is great, it handles loads of data and is based on DHTs. This means that when there are cluster changes there is a need for redistribution of data and the process needs to be properly managed, with handoffs and so on. It is really great but it’s eventually consistent and on many occasions it may be overkill when all I’m looking for is a simple in-memory ACI(not D) KV solution which can have 100% of its data replicated on every node.
mnesia which could be it, but unfortunately requires special attention when initializing tables and making them distributed (which is tricky), handles net splits very badly, needs hacks to resolve conflicts, and does not really support dynamic clusters (additions can be kind of ok, but for instance you can’t remove nodes unless you stop the app).
…other solutions? In general people end up using Foundation DB or REDIS (which has master-slave replication), so external from the beam. Pity, no?

So… Well I don’t plan to write a database (since ETS is awesome), rather distributing it in a cluster. I’d simply want a distributed ETS solution after all!

I’ve already started the work and released a version 0.1.0 or ram:

Docs are here:
https://hexdocs.pm/ram

Please note this is a very early stage. It started as an experiment and it might remain one. So feedback is welcome to decide its future!

Best,
r.

kokolegorille · December 21, 2021, 2:05pm

There is one in Elixir

ostinelli · December 21, 2021, 2:13pm

It looks nice! Thanks for pointing it out, but it doesn’t look distributed, which is my main point. Or did I miss where it says so?

kokolegorille · December 21, 2021, 2:15pm

I am not sure about the distributed part, it’s probably more for embedded system.

Maybe @lucaong can confirm.

eksperimental · December 21, 2021, 4:39pm

CubDB is great, I have been using it for some of my projects. I love its simplicity,.
You can read this FAQ to learn more about it.
https://hexdocs.pm/cubdb/faq.html#content

I don’t think it is distributed.

lucaong · December 21, 2021, 4:58pm

Yes @kokolegorille and @ostinelli , CubDB is an embedded database, therefore by design not distributed. In principle one could use it as a backend for a distributed system, but its main use case is embedded applications (think Nerves, or a mobile or desktop application), or for cases where one needs persistent and fail safe application-local storage.

It goes beyond key-value: it supports sorted selection of ranges, and atomic transactions. It is stored on disk, and optimized for robustness in case of sudden shutdown and for needing very little memory and CPU (all good qualities when running on small embedded devices).

You could evaluate it as an optional durable backend for your distributed K/V store, although in your case DETS is probably simpler to adapt if you only need K/V

ostinelli · December 21, 2021, 5:22pm

Thanks for your input Luca! I’ve no plans on making ram persistent (it’s in the name!), there are many solutions for persistency and they solve other problems. I was just looking for something similar to a distributed ETS table.

RudManusachi · December 21, 2021, 6:58pm

Thank you for your work!

I haven’t tried yet, but seems like also a reasonable solution could be put up together based on rabbitmq/ra.
Here is a tutorial how to build a simple distributed kv-store based on map.

ostinelli · December 21, 2021, 8:06pm

Thanks for your input! Do you know whether this example is consistent (not eventually consistent) and whether it automatically supports dynamic clusters?

RudManusachi · December 21, 2021, 8:33pm

Do you know whether this example is consistent

The library behind it, ra, implements raft consensus algorithm so it’s consistent.

whether it automatically supports dynamic clusters?

That’s the part that’s not covered in the tutorial there are functions in :ra that could help to achieve that, though, :ra.add_member/2, :ra.remove_member/2 and :ra.members/1.

Btw, thank you for responding me in twitter! I think it will help users if it’s mentioned in the description of ram that it uses 2PC to achieve consistency!

fabriziosestito · December 21, 2021, 9:46pm

This part of the README covers dynamic clusters.
Only thing to keep in mind is that only one cluster membership change at time is allowed, concurrent changes will be rejected by design (this is described in the Raft paper too if I am not mistaken). This means you have to wait for the clusters to propagate the cluster membership change prior changing it again.

About consistency guarantees, Raft guarantees strong consistency.

Have a lot of fun!

cmo · December 22, 2021, 12:45am

https://rabbitmq.github.io/khepri/overview-summary.html

ostinelli · December 22, 2021, 9:51am

The library behind it, ra , implements raft consensus algorithm so it’s consistent.

The decision is consistent, but the data still needs to propagate to the cluster if I’m understanding this correctly. If so, wouldn’t this make it eventual consistent?

ostinelli · December 22, 2021, 9:52am

https://rabbitmq.github.io/khepri/overview-summary.html

It’s an on-disc database if I’m not mistaken.

fabriziosestito · December 22, 2021, 12:14pm

The decision is consistent, but the data still needs to propagate to the cluster if I’m understanding this correctly. If so, wouldn’t this make it eventual consistent?

No, because Raft will wait for the majority of nodes to have replied to the append entry call prior acking to the client. If nodes cannot reach a majority it will not handle the write request.

ostinelli · December 22, 2021, 1:52pm

majority ≠ all. The question is whether, during a transaction, some nodes might be updated and others not yet when a read operation takes place. If reads can temporarily return different values while waiting for propagation, it would be eventually consistent, not strongly consistent. This has nothing to do with the decision mechanism which provides consistency, but with the propagation (eventual vs strong).

I simply don’t know what is the mechanism Raft implements to avoid such a case, hence my question.

fabriziosestito · December 22, 2021, 2:04pm

The read and writes always pass through the leader, raft ensures the strongest form of consistency (linearizability).
I can only suggest to read the Raft paper and see if it fits whatever use-case scenario you have in mind.

ostinelli · December 22, 2021, 2:09pm

Ok then, that’s the answer: Raft uses leader election for both writes and read operations. It’s a choice.

IvanR · December 24, 2021, 2:01pm

@ostinelli is it a bad idea to ask you to run Yahoo! Cloud Serving Benchmark (YCSB) Home · brianfrankcooper/YCSB Wiki · GitHub on Ram to compare with other solutions (like Redis, etc)?

Like this one Benchmark (YCSB) numbers for Redis, MongoDB, Couchbase2, Yugabyte and BangDB - High Scalability -

sebp · December 28, 2021, 1:58pm

If it can help for inspiration, other people invested a lot of time building databases on top of Raft to provide performances, scale and consistency: Cockroach DB folks did a good job at explaining their design. They organize the data in “ranges”, stored within Raft member groups (the idea is to not have a single group for all the data, to spread the impact of members joining or leaving to only subsets of data), it also spreads the load to many raft leaders (one leader for each range) to distribute the load.

You can read more about in their docs Replication Layer (all that range thing is described with more details in the cockroach University learning modules)

It’s not Elixir/Erlang, but these ideas are language agnostic.