ExESDB - a BEAM-Native Event Store

beamologist · March 29, 2025, 9:54am

Event Sourcing with CQRS is a technique for building applications that are based on an immutable log of events, which makes it ideal for building concurrent, distributed systems.

Though it is gaining popularity, the number of options for storing these events is limited and require specialized services like Kurrent (aka Greg’s EventStore) or AxonIQ.
One of the strong-points of the BEAM is, that it comes ‘batteries included’: there are BEAM-native libraries for many common tasks, like: storage, pub/sub, caching, logging, telemetry, etc.

ExESDB is an attempt to create a BEAM-native Event Store, building further upon the Khepri library, which in turn builds upon the Ra library.

On the roadmap:

integration with pg2 and Phoenix.PubSub for side effects (i.e. read model projections)
interfacing with the Commanded library
leverage Partisan for clustering

Check it out on Hex or GitHub:

Eiji · March 29, 2025, 12:14pm

Interesting idea. I’m curious about plans for the commanded support.

So you plan to create an implementation for the Commanded.EventStore.Adapter behaviour, right? In that case it would be interesting to hear about a key differences between your library and PostgreSQL-based Elixir EventStore / EventStoreDB.

I guess it would be something like In memory event store, but for production. That should give us less required environment dependencies which is great. Are there any other pros for that like speed or security?

What kind of serializer would be needed? Would it work with the Elixir structs as-is or we would still need to use a JSON serializer?

Suppose some project already uses other adapter. Would that require some extra steps for migrating data?

beamologist · March 29, 2025, 5:22pm

Hi Tomasz,
Thank you for your feedback.
Next to the advantages you already mentioned (speed, security, no need for extra serialization - events are indeed stored as Erlang terms), I’d like to add the capability to deploy event sourced services as a self-contained BEAM-native release to the edge. This would allow us to leverage for instance The Nerves Project for deployment.
I am also looking in to https://bondy.io for scenarios where you could have 10E+N nodes in the network.
Much of my past work revolved around decentralized and autonomous systems (think Parking Facilities, Vehicles, Agricultural automation, Logistics etc), often in a “spotty” environment, where nodes aren’t always connected. Such systems benefit little from SaaS solutions, if the network is not available. That space could be considered my main motivator to build a BEAM-native event store: as few dependencies on 3rd party services as possible.

The reason for implementing the Commanded Adapter is simple: it is the de-facto event sourcing standard for the BEAM and is as far as I am concerned, feature complete.

When it comes to migrating data from existing stores, I’d argue that’s quite easy, barely an inconvenience: replay the old store and project into the new.
So, indeed: highest points on the agenda are:

1, have Kherpi triggers throw seen events on pg2 (for projections etc…)
2. Dynamic Clustering via Partisan
3. Commanded Adapter
4. Monitoring/Telemetry
…

garrison · March 31, 2025, 8:30pm

This seems cool - love seeing more database projects in Elixir!

A couple of random questions from someone who knows very little about CQRS/ES:

Khepri, like Mnesia, is an in-memory database (which also persists to disk). If you’re storing an immutable log, would you eventually run out of memory? Or is the log truncated at some point?

Khepri, as I understand it, is a K/V store built on top of a Raft log. Since the thing you’re storing is, of course, a log, would it make more sense to use Ra directly?

I see you mentioned a large number of nodes. Is the idea here to have many individual Khepri clusters running independently within a large cluster?

What is Khepri’s throughput like? I would imagine you would get better results with aggressive batching.

beamologist · March 31, 2025, 11:56pm

Thank you for your input, those are some very valid points and concerns.
The main driver for this project is decentralization and in such scenarios, I’d imagine JIT availability and localized sharding functions as a counterweight to throughput. For now, I don’t worry about this too much yet and focus on getting the store operational.
Most of the dedicated Event Stores (I know of) are centralized at the data center level and there is not much literature about decentralized event sourcing. It probably opens a whole different can of worms, but we need tooling to investigate it. ExESDB should be seen in this context.

As an example, imagine the scenario of a parking facility where vehicles enter and exit, people enter and exit, payments are made etc…next, imagine such a facility not being managed by a centralized system, but rather by a mesh of SBCs that perform individual parts of the process. Such a system might consist of a few hundred devices, and indeed, in that mesh there might be a number of individual realms that are responsible for parts of the process.

byu · April 30, 2025, 10:18am

I will definitely follow your progress on this, sounds interesting.

Question:

If the assumption is to run this “at scale”, whatever that may be, what is the concept/approach about consistency boundaries for said scale?

That is, in EventStoreDB (Kurrent) and Commanded’s own Postgres EventStore, there is the “stream local” boundary with optimistic concurrency for the stream itself-- aka the incrementing gapless version number between events in a stream–, but these events do get projected into some order in the $all stream.

Is the use of khepri and ra to make sure that the “individual stream” is consistent across the distributed cluster nodes?
And that also there will be an $all stream (or other projections) that also will provide some sort of consistent ordering across the cluster? Also using khepri and ra?

Given that Kurrent has an HA cluster solution, I’m assuming that there is known distributed systems approach to merge all these various distributed small stream events into a combined single $all projection? If so, what is it? Got references for me to learn from? And how would ExESDB accomplish this task?

Thanks for the experimenting of your project and edifying me on this.