Beam/otp for distributed databases/transactions

Is there something like OTP, but instead of focusing on distributed processes / severs, it focuses on providing building blocks for distributed databases / distributed state / distributed transactions ?

1 Like

Well, what do you think a GenServer / Agent is? Each and every one of them can hold its own state which you can then persist somewhere else (not only in memory).

The building blocks are already there in the BEAM. It just all depends how you want to model your software. You donā€™t have to expose a server in the classical sense. You can make a distributed BEAM app that only responds to particular requests (like a DB).

Maybe Mnesia.

1 Like

Have you ever heard of DynamoDB, Cassandra, Voldemort, Redshirt, HTable, Riak, ā€¦ ? I assure you the core technical challenge they solved was not ā€œuse GenServerā€.

1 Like

In that case it sounds like you made a decision then? Because I truly donā€™t understand your question.

If you are looking for a library thatā€™s the building blocks of said databases, maybe their sources can help?

I have no idea what ā€˜decisionā€™ refers to. I agree that you donā€™t understand the question.

I know how to use git clone. I am asking because I believe someone in the Elixir community might know of a resource for building distributed databases / transactions that is more useful than just ā€˜read the sourceā€™.

1 Like

Iā€™m hesitant to jump in given the tone of this thread, but just to help clarify what youā€™re asking, is Noria (written in Rust) similar to what you are asking about? Are you asking if a similar tool has been written in Elixir already?

Maybe Sage is what youā€™re after:
https://hexdocs.pm/sage/readme.html

7 Likes

FYI, Riak is written in erlang.

2 Likes

Based on a very superficial read of github readme, Noria dramatically speeds up MySQL reads through pre-computation / caching ā€“ which is a very cool idea, but not what I am looking for.

I still donā€™t understand Sage/Sagas does ā€“ so Iā€™m not sure.

To clarify a bit, let us consider compilers/parsing for a moment. At one point in time people wrote parsers by hand. Then people invented Regex, CFG, LL, LR, Peg, Packrat, Parsec ā€“ and now, it is not uncommon to write a parser for a toy language in 1-2 hours.

If we look at databases that run on multiple machines ā€“ these are currently herculean efforts ā€“ and what I am curious here is if Erlang/Elixir has libraries / building blocks that drastically simplify these tasks. The reason why I believe this might be possible is that Elixir/Erlang already (via OTP) greatly simplifies writing distributed / fault tolerant severs; so it seems plausible that there would be some similar library that also helps with distributed databases / distributed transactions.

Another Erlang example that I think fits what youā€™re after:

Actually they depend on Riak, which youā€™ve already mentioned. Nevermind.

2 Likes

I appreciate the effort here, but I should probably clarify a bit more.

3 = stuff built on top of these databases

2 = DynamoDB, Cassandra, Voldemort, BigTable, Riak, Spark, RedShift, ...

1 = ?????

0 = Language / Runtime: C, C++, Rust, Go, Java, Spark, Elixir/Erlang/Beam

If we look at this diagram here, what Iā€™m really interested in are the stuff in ā€œlayer 1ā€. At layer 0, we have the languages / run times. At layer 2, we have these heroic efforts of distributed databases.

What Iā€™m really after are ideas / libraries that exist in ā€œlayer 1ā€ ā€“ stuff that we build on top of existing languages / runtimes, but are also useful in building these distributed databases.

Your question about level 1 is quite broad.

Does level 1 include storage engines? You could make a storage engine based off of existing k/v stores written in Erlang / Elixir, like CubDB and Leveled. There are also tools that have erlang wrappers like eleveldb, which mnesia_eleveldb uses as a backend. That being said, Mnesia itsself exists as a database. So is DETS

Does level 1 include serialization? sext, message pack implementations, term_to_binary, BERT, etc all exist.

Does level 1 include distributed ordering and transactions? evc provides vector clocks. There are monotionically increasing values like ksuid implemented in Elixir.

riak_core is also a ā€˜level 1ā€™ technology, a distributed consistent hash ring. Much like what backs dynamo.

Does level 1 include consensus? Paxos implementations exist, as well as ra, a raft implementation exist.

Does level 1 include distributing state? Is it locating remote processes, horde and swarm, and pg exist. Eventually consistent state with CRDTs like lasp and delta_crdt exist.

Does level 1 include tools for building a query language? leex, yecc, nimble_parsec exist.

Does level 1 include data structures for caching? Building indexes? Query optimizers?

As you can see, level 1 is broad and my answer hardly scratches the surface. Your end goal isnā€™t clear. Narrowing your question will make it easier to get answers. Maybe skimming awesome-elixir or awesome-erlang will give some hints, as theyā€™re well organized resources. But, the GenServer certainly is a ā€˜building blockā€™ for a distributed system.

7 Likes

Anyone willing to help should not have to feel this way. @stevensonmt, thank You for your answer.

Please consider respecting others, even if answers does not fit your need. If this thread continue in this trend, I will have to close it for a momentā€¦

7 Likes

With exception of parsing/query optimization, Iā€™m interested in everything else. (Storage engines, Distributed Clocks, CRDTs). Do you know of any resource that tries to break down the various trade-offs on orthogonal dimensions? This problem domain seems to fit Elixir/Erlang really well.

I feel like there exists here an interesting set of idioms that exists at ā€œone level above OTPā€, but are still low level enough to be useful primitives for building these distributed databases / transactions.

With mpope on this one, distributed systems are an extremely broad topic to cover.

Iā€™m not sure of anyone actually breaking down the tradeoffs per say, but there are various good reads out which you can search for the keywords.

Hope that helps.

3 Likes

Distributed systems is a fascinating topic !

This talk might be interesting to you: ElixirConf 2020 - Thomas Depierre - How we built a distributed datastore in Elixir - YouTube :slight_smile:

Not to discourage you, but to get some feedback from people who did it.

An article that is I thing good to start with: The dangers of the Single Global Process

To give some feedback, on my side, I did some experiment a few years ago.
On pet projects with riak_core, riak_core_lite, jepsen, and other stuff.
I learned a lot about many things topics: network, deployment, docker ā€¦

Then I felt, like I need to know more theorical things, so I dive in some very good ressources:

Books:

Papers: Raft, Paxos, Dynamodb

Then, the question was: How to test it ? To ensure that it do not only ā€œwork on my machineā€ !

That question lead me to https://jepsen.io/ , the Riak test suite and some property base testing tools.

Itā€™s a quite a huge journey to build a distributed database, even with the Erlang/Elixir/OTP stack.

But, even if you build 10% of it, you will have a lot of fun and get very valuable knowledge about:

  • coordinating many microservices
  • network latency, idempotency, retry
  • ordering, duplicate messages
  • ā€¦
8 Likes

Great article. However, start thinking in distributed system too early is another evil. SGP, possibly with ETS to speed up reading can get you a long way.

1 Like

Your original post triggered conversations I have personally had around how Elixir & OTP present a different paradigm in writing distributed and concurrent software. Building an app using Elixir and OTP is kinda like building a monolith with built-in microservices. It is a different paradigm than developing a traditional microservices architecture which will have a different treatment for storing or persisting user or application state.

OTP itself can help with distributed state and distributed transactions.

Distributed databases is a different beast all together. Here is a great article that reflects the more cloud-native distributed databases and persistence technologies.

Personally, I have been working with some success on leveraging Distributed SQL databases and its integrations with Ecto and Phoenix. Examples are Yugabyte (which leverages a Postgres query layer) and CockroachDB. You can check those out if you like.

1 Like

The closest I can think of is foundationdb (https://www.foundationdb.org/) Itā€™s not a database itself because it lacks the query, planning etc for making it an actual database, but it gives an api which you can apply to actually build a database on top of it, tho it handles all the distributed aspect youā€™ve to keep in mind a lot of things to maintain that distributed features, so I consider it a good building block

2 Likes

There are some nice Erlang bindings for RocksDB and riak_core is probably as close as youā€™ll get for an off the shelf library for a distributed DB.

BarrelDB is also an interesting project, barrel-db Ā· GitLab

1 Like