Is there something like OTP, but instead of focusing on distributed processes / severs, it focuses on providing building blocks for distributed databases / distributed state / distributed transactions ?
Well, what do you think a
Agent is? Each and every one of them can hold its own state which you can then persist somewhere else (not only in memory).
The building blocks are already there in the BEAM. It just all depends how you want to model your software. You don’t have to expose a server in the classical sense. You can make a distributed BEAM app that only responds to particular requests (like a DB).
Have you ever heard of DynamoDB, Cassandra, Voldemort, Redshirt, HTable, Riak, … ? I assure you the core technical challenge they solved was not “use GenServer”.
In that case it sounds like you made a decision then? Because I truly don’t understand your question.
If you are looking for a library that’s the building blocks of said databases, maybe their sources can help?
I have no idea what ‘decision’ refers to. I agree that you don’t understand the question.
I know how to use
git clone. I am asking because I believe someone in the Elixir community might know of a resource for building distributed databases / transactions that is more useful than just ‘read the source’.
I’m hesitant to jump in given the tone of this thread, but just to help clarify what you’re asking, is Noria (written in Rust) similar to what you are asking about? Are you asking if a similar tool has been written in Elixir already?
Maybe Sage is what you’re after:
FYI, Riak is written in erlang.
Based on a very superficial read of github readme, Noria dramatically speeds up MySQL reads through pre-computation / caching – which is a very cool idea, but not what I am looking for.
I still don’t understand Sage/Sagas does – so I’m not sure.
To clarify a bit, let us consider compilers/parsing for a moment. At one point in time people wrote parsers by hand. Then people invented Regex, CFG, LL, LR, Peg, Packrat, Parsec – and now, it is not uncommon to write a parser for a toy language in 1-2 hours.
If we look at databases that run on multiple machines – these are currently herculean efforts – and what I am curious here is if Erlang/Elixir has libraries / building blocks that drastically simplify these tasks. The reason why I believe this might be possible is that Elixir/Erlang already (via OTP) greatly simplifies writing distributed / fault tolerant severs; so it seems plausible that there would be some similar library that also helps with distributed databases / distributed transactions.
Another Erlang example that I think fits what you’re after:
Actually they depend on Riak, which you’ve already mentioned. Nevermind.
I appreciate the effort here, but I should probably clarify a bit more.
3 = stuff built on top of these databases 2 = DynamoDB, Cassandra, Voldemort, BigTable, Riak, Spark, RedShift, ... 1 = ????? 0 = Language / Runtime: C, C++, Rust, Go, Java, Spark, Elixir/Erlang/Beam
If we look at this diagram here, what I’m really interested in are the stuff in “layer 1”. At layer 0, we have the languages / run times. At layer 2, we have these heroic efforts of distributed databases.
What I’m really after are ideas / libraries that exist in “layer 1” – stuff that we build on top of existing languages / runtimes, but are also useful in building these distributed databases.
Your question about level 1 is quite broad.
Does level 1 include storage engines? You could make a storage engine based off of existing k/v stores written in Erlang / Elixir, like CubDB and Leveled. There are also tools that have erlang wrappers like eleveldb, which mnesia_eleveldb uses as a backend. That being said, Mnesia itsself exists as a database. So is DETS
riak_core is also a ‘level 1’ technology, a distributed consistent hash ring. Much like what backs dynamo.
Does level 1 include consensus? Paxos implementations exist, as well as ra, a raft implementation exist.
Does level 1 include data structures for caching? Building indexes? Query optimizers?
As you can see, level 1 is broad and my answer hardly scratches the surface. Your end goal isn’t clear. Narrowing your question will make it easier to get answers. Maybe skimming awesome-elixir or awesome-erlang will give some hints, as they’re well organized resources. But, the
GenServer certainly is a ‘building block’ for a distributed system.
Anyone willing to help should not have to feel this way. @stevensonmt, thank You for your answer.
Please consider respecting others, even if answers does not fit your need. If this thread continue in this trend, I will have to close it for a moment…
With exception of parsing/query optimization, I’m interested in everything else. (Storage engines, Distributed Clocks, CRDTs). Do you know of any resource that tries to break down the various trade-offs on orthogonal dimensions? This problem domain seems to fit Elixir/Erlang really well.
I feel like there exists here an interesting set of idioms that exists at “one level above OTP”, but are still low level enough to be useful primitives for building these distributed databases / transactions.
With mpope on this one, distributed systems are an extremely broad topic to cover.
I’m not sure of anyone actually breaking down the tradeoffs per say, but there are various good reads out which you can search for the keywords.
- Serializability vs Linearizability
- Operational transforms vs CRDT’s
- CAP / PACE
- Vector clocks (lamport)
- GitHub - lasp-lang/lasp: Prototype implementation of Lasp in Erlang.
- https://www.foundationdb.org/ built an entire language to ensure correctness over the network.
Hope that helps.
Distributed systems is a fascinating topic !
This talk might be interesting to you: ElixirConf 2020 - Thomas Depierre - How we built a distributed datastore in Elixir - YouTube
Not to discourage you, but to get some feedback from people who did it.
An article that is I thing good to start with: The dangers of the Single Global Process
To give some feedback, on my side, I did some experiment a few years ago.
On pet projects with riak_core, riak_core_lite, jepsen, and other stuff.
I learned a lot about many things topics: network, deployment, docker …
Then I felt, like I need to know more theorical things, so I dive in some very good ressources:
Papers: Raft, Paxos, Dynamodb
Then, the question was: How to test it ? To ensure that it do not only “work on my machine” !
That question lead me to https://jepsen.io/ , the Riak test suite and some property base testing tools.
It’s a quite a huge journey to build a distributed database, even with the Erlang/Elixir/OTP stack.
But, even if you build 10% of it, you will have a lot of fun and get very valuable knowledge about:
- coordinating many microservices
- network latency, idempotency, retry
- ordering, duplicate messages
Great article. However, start thinking in distributed system too early is another evil. SGP, possibly with ETS to speed up reading can get you a long way.
Your original post triggered conversations I have personally had around how Elixir & OTP present a different paradigm in writing distributed and concurrent software. Building an app using Elixir and OTP is kinda like building a monolith with built-in microservices. It is a different paradigm than developing a traditional microservices architecture which will have a different treatment for storing or persisting user or application state.
OTP itself can help with distributed state and distributed transactions.
Distributed databases is a different beast all together. Here is a great article that reflects the more cloud-native distributed databases and persistence technologies.
Personally, I have been working with some success on leveraging Distributed SQL databases and its integrations with Ecto and Phoenix. Examples are Yugabyte (which leverages a Postgres query layer) and CockroachDB. You can check those out if you like.
The closest I can think of is foundationdb (https://www.foundationdb.org/) It’s not a database itself because it lacks the query, planning etc for making it an actual database, but it gives an api which you can apply to actually build a database on top of it, tho it handles all the distributed aspect you’ve to keep in mind a lot of things to maintain that distributed features, so I consider it a good building block
There are some nice Erlang bindings for RocksDB and riak_core is probably as close as you’ll get for an off the shelf library for a distributed DB.
BarrelDB is also an interesting project, barrel-db · GitLab