Erlang/Elixir native Etcd, Zookeeper alternative

A big advantage to Elixir is all the distributed goodness but for many applications running on multiple nodes having integrated Etcd, Zookeeper type system would make writing distributed applications a lot easier. This is fairly sizable undertaking but would put Erlang/Elixir pretty much out of reach of anything else out there as a target platform for distributed applications. If there are people/companies who could take this own I bet we could create a kickstarter to raise the money to actually fund and build this. (Would gladly contribute money). What do you guys think?

4 Likes

Can you elaborate on what Etcd and Zookeeper are and what about them youā€™d want in Erlang/Elixir?

A bit of copy pasta from Etcd:
etcd is a distributed key value store that provides a reliable way to store data across a cluster of machines. Itā€™s open-source and available on GitHub. etcd gracefully handles leader elections during network partitions and will tolerate machine failure, including the leader.

Your applications can read and write data into etcd. A simple use-case is to store database connection details or feature flags in etcd as key value pairs. These values can be watched, allowing your app to reconfigure itself when they change.

Advanced uses take advantage of the consistency guarantees to implement database leader elections or do distributed locking across a cluster of workers.

We could have Kafka like log store thatā€™s native to the ecosystem so you could build event processing pipelines without external dependencies and with nice integrations with GenStage or just as simple distributed consistent fault tolerant Registry etc.

2 Likes

How does it differ from Riak (and itā€™s various parts) and which would you chose for which purposes?

I canā€™t say I am super familiar with Riak various parts :slight_smile: I donā€™t think Riak KV has ability no notify on key changes and in general Riak is more of a Data Store vs etcd/Zookeeper are mostly used as HA distributed consistent config store. So Riak is more of an end product vs etcd/Zookeeper is an infrastructure piece. Plus it used to be that Riak needed a custom build of erlang VM not sure where that stands.

So the avg data processing pipeline now has some ingest service that writes data to kafka and some processing pipeline reading of kafka and doing processing. So on ops side you have to maintain a lot of pieces including Kafka, Zookeeper etc. And if we had integrated solution Elixir would be super compelling story for these types of projects.

2 Likes

Considering they sound like primarily a distributed configuration store with the ability to notify, why not just use the built in mnesia in that case, simple function wrappers can perform the notifications on every node and all. :slight_smile:

/me trying to understand what they do as not heard of them beforeā€¦ ^.^

I donā€™t think Mnesia implements distributed consensus (etcd is using RAFT and Zookeeper ZAB(similar to paxos))

2 Likes

There are layers on top of it though. In addition there are a host of libraries in the erlang ecosystem that definitely do in a variety of ways for whatever would work best for you (Riak-Core is one such consensus library that I think works well standalone unlike the entirety of Riak). :slight_smile:

/me has noticed even more unknown words like RAFT, ZAB, and paxosā€¦ >.>

Riak core at first look looks to be purpose built for Dynamodb style thingys.

Not sure how dynamodb does it, but riak_core works like:

  • You set up an abstract space, these are basically ā€˜partitionsā€™, more tends to be better until they get to hold too little. Your maximum number of nodes that can be joined is the partition count, so setting it to like 128 or so is usually overkill. The partitions form an indexable cyclic ring.
  • There are ā€˜virtual nodesā€™, each holds a single partition. These virtual nodes are separated and distributed evenly around the number of actual nodes in your system (distributed to keep ā€˜nearbyā€™ partitions in the ring to be on as many nodes as possible).
  • The virtual nodes themselves are what do ā€˜workā€™. They receive and (optionally) respond to commands (not necessarily to the sender). Riak_Core I think comes with a default set of virtual nodes that act as a work queue so you can submit work to it like poolboy (the ā€˜commandā€™ is just a function to execute in this case).
  • Riak uses it to store key/value information in it, it uses a distributed hasher to uniquely define where the key belongs in the partition ring (this is all handled internally), it then acquires the virtual node that matches that location and sends the command to store it it in that one as well as in the adjacent few before and after (depending on the quorum configuration you have it set to) and those virtual nodes will do whatever you wanted with the key/value, such as store it, retrieve it, mutate it, etcā€¦
  • You can however do whatever you wish in a virtual node, from being a processing array to store/fetch data to whatever.
  • Riak_core has a riak_core_bucket thing built in that acts as a simple namespace with configurable properties, good for storage of information about the ring itself or small amounts of other data (say configuration data).
  • Actual nodes that host virtual nodes can be taken up and down as wished, the virtual nodes automatically migrate to actual nodes that have the capabilities they need.

The example of use is of course Riak itself, say you set a quorum of 3/2, so 3 nodes have to verify a data write before a write is successful (the write message returns) and 2 nodes have to verify a data read (and they must match, otherwise all 3 will be checked and the 3rd will be temporarily taken offline as it performs a full resync, this will not affect the uptime of anything and is usually pretty close to instant anyway). As storage happens the information replicates around the ring until they are all in synch (but certain virtual nodes basically ā€˜ownā€™ certain keys). This means that processing is distributed around the node while knowing that the value you ask for is written correctly as well as is read up-to-date since they get serialized through the quorum processes.

There are a lot of other aspects of it and this is a high level overview, but it is quite nice. Iā€™ve used Riak in the past for quite a number of things (though nowadays, to be honest, Iā€™d just use PostgreSQL, it even has notification on change functionality and all).

EDIT: The intro post of Riak Core:

4 Likes

One major difference is that Zookeeper is CP while Riak is AP.

Riak has the best tools for dealing with eventual consistency that Iā€™ve ever seen in a product. Things like sibling detection and resolution. But still, your application needs to be consistency tolerant and be aware of the sibling problem. Many arenā€™t and for them a CP solution that is A as much as possible makes sense.

2 Likes

/offtopic Has anyone resurrected Riak from the ashes of Basho? That tech was too good. But I wouldnā€™t bet my job on using it for critical infrastructure components anymore. Not at least without a strong open source community behind it.

1 Like

Bet365 grabbed it, and appears to be doing a good job with it.

2 Likes

I heard that Bet365 bought it and open sourced everything. And that the UK NHS was also contributing. But thatā€™s the last Iā€™ve heard. Thereā€™s no evangelism, no community forum, nobody offering support, no release schedule, no product/release vision. Iā€™d feel better about your average Apache project at this point. If you already have Riak in-house, thatā€™s a different matter. But this makes me nervous. It could be that my info is outdated though.

1 Like

Iā€™m not sure about other things, but support for riak is offered by Erlang Solutions.

https://www.erlang-solutions.com/products/riak.html

2 Likes

That is exactly what DynamoDB is

2 Likes

Thereā€™s several implementations of leader election, but none of them is very clean or easy to use really. At the end of the day Erlang was created for systems where net splits are very unlikely, and I think most users still use it under this assumption, or defer to an external system (typically PostgreSQL) to ensure consistency.

The ā€œtraditionalā€ option has been gen_leader. Itā€™s been around for many years, used in Riak, model checked, etc. so it should be good at this point, but you have to be careful to use the right branch :sweat_smile:

https://twitter.com/potsdamnhacker/status/928483849394196480

Thereā€™s also a Locks library, which I believe was a refinement of the above. Not sure about production usage.

Riak worked on Ensemble too, but folded before it could become a solid library.

More recently the RabbbitMQ folks announced a project to implement Raft. This is still WIP. It could be a great building block though, especially if it gets used in RabbitMQ which would give it much needed battle testing.

Putting aside leader election for a bit - Swarm recently got a quorum strategy which can help ensure a unique instance of a given registered process exists in the cluster. A big limitation though is you canā€™t resize the cluster easily.

6 Likes

Looks promising

I think the killer app is basically gen stage producer that is internally powered by kafka like distributed replicated log but has no dependencies external to Erlang/Elixir ecosystem.

2 Likes

If itā€™s a remote system, what does it matter what itā€™s written in? As it is now, Elixir would benefit from great integrations into these products. Some exist. But they donā€™t seem to be super high quality at this point.

Never mind. I misread the whole concept of the OP.

A number of libs exist that try and expand upon the capabilities of Distributed Erlang. Iā€™ll try and dig some up a bit later. One nice thing that external trackers provide, is that the heavyweight aspects of tracking are isolated to your external tracking cluster. For example, Zookeeper depends on super fast disk IOPS. Do you want this to be a requirement everywhere, or just for the Zookeeper deployments?

2 Likes