DGen - A distributed GenServer

Ah, I see we have QuiCK at home :slight_smile:

I wrote this somewhere before, but the funny thing about quick is it’s not interesting at all. Apple just implemented the dumbest possible queue on top of a really, really good database.

The hard part of implementing something like a quick queue is keeping the multi-level index in sync. Except it’s not hard at all, because you can literally just write the two completely disparate index rows along with the user data in a single atomic transaction with perfect consistency, with zero effort, because FDB. That is the power of actually taking the time to design a system with useful guarantees.

I mean I don’t speak for anyone else, but I’ll probably generalize versionstamps by just implementing them exactly like FDB because they’re pretty good. For a literally-serialized system like SQLite any counter will do.

Watches are bad, though.

1 Like

Oh, so thats a replicated state machine then. And this one is backed strictly by Foundation DB. There were some other projects which did similar replicated state machines on other databases, however I can’t remember any names

Thats better, but still, I don’t understand why input must come from messages. If you had input provided into the state machine by a function call (not GenServer.call or cast), you could atomically add batches of input, you could perform dirty actions without this returning-the-closure pattern, you wouldn’t have a problem of some unexpected casts arriving, these messages having temporary data, etc.. And even if user wants to have input coming from messages, they could write their own wrapper GenServer which would have very explicit control on what gets added to the replicated state machine input queue and what gets ignored.

I’d suggest something like FDBReplicatedStateMachine or fdb_rsm_server or FoundationReplicatedServer, because it describes what your program does. Key words here are “Replicated”, “State machine” and “FoundationDB”


I am still reading the library code. So far, there is a lot of room for optimization. For example, there is unnecessary double await in call. There is case dgen_queue:length(...) of 0 -> code, which can be optimized to not compute length and just check if empty

There are also some strange design decisions. For example, if dgen_config:init is not called, config will work, it will just ignore user provided values. And it makes it impossible to have two DGenServers with different backends.
Next thing is that current dgen_backend behaviour simply matches erlfdb interface. I think that you should limit a behaviour and make it more generic, cause not all distributed databases support futures, directory and keyspace operations. Otherwise you won’t see any other backends in the future. For example, this sophisticated state encoding/decoding approach you’re using is only a subject to erlfdb implementation, because If I were to implement a postgres backend, I would not need this encoding approach

2 Likes

Also, I think @Asd is probably right about the name making no sense, but there is a strong counterpoint to be made that calling the library “degen server” is extremely funny.

2 Likes

The original idea of a GenServer was that if there is a bug in execution state, it crashes and start over, to recover execution from a blank state.

If you start over with the same state, then you have the same bug. If you need persistence, why not just use a proper database?

I read everything, but still can’t get the point. Is it an experiment to learn?

2 Likes

Thanks for taking the time to review the project.

w.r.t. the double await, I think you’re talking about dgen:call/4. The first wait is to a BEAM process. This puts the message onto the durable kv-queue and returns a sentinel key from which the caller can receive the final result. Given the current design, this is necessary because the caller doesn’t know the details about the queue’s identity. Via regular BEAM message passing, DGen allows anyone to push a message, as long as they have a pid, or can look one up. I could have instead chosen to represent the queue details in a struct that the caller must have in order to push. This choice would violate the premise, which was to mimic the GenServer interface, because I like it and find it useful for composing programs. You may disagree with the premise, which is fine, but this is not an unnecessary action.

The second wait in that dgen:call/4 is receiving that final result, which comes from the server-pushed resolution of the watch future. In this case the message comes from the storage backend rather than the DGenServer process. The result of the operation is then retrieved.

This is not correct. The length of :dgen_queue is computed by the difference of two values in the kv store = (number of pushes - number of pops). It does not have a key that represents “emptiness” of the queue. Adding such a key would force us to add more key conflicts to the push and pop functions. This would likely slow them down. As it stands now, we retrieve 2 values concurrently.

You’re right. It’s awkward and wrong.

I tried to be clear about this in the post: I don’t know how to put another backend in here, but I desperately want to, and the interface of :dgen_backend will definitely have to change a lot.

So why do this at all? I find FDB Layers useful. They can be composed into higher level abstractions, and it results in the most ergonomic state management I’ve ever worked in. However, they necessarily tie you to FDB. While I happily run FDB, I don’t want to forever. Since there are other great projects developing that are inspired by FDB, I hope DGen becomes a real Layer that can be compatible with those projects. A Postgres backend is not interesting - the community already has Oban.

1 Like

At risk of being overly pedantic, I’m going to challenge this, but only slightly. The original idea of supervisor is to do this, but gen_server itself is not opinionated about how, when, or why it’s restarted, or if it is at all.

Of course, the design of gen_server is amenable to being used by the supervisor in a powerful and useful way, just like you describe. I’m a direct beneficiary of the genius design of this simple idea.

DGenServer breaks the rules a little bit. It can still be stopped and restarted by the supervisor, but if a poison message is the result of the crash, it may very well require an operator to intervene - either by correcting database state, fixing a bug in the code, or changing some upstream service. I agree this is a weakness in the design.

This is dismissing FoundationDB as a proper database. Why?

2 Likes

This is a prime example of why asking a one-line “why” question in a forum is such a bad idea. I am sure both of you have good intentions, but because of the lack of common context, trading a bunch of short “why” questions will only steer the discussion further away from truth seeking.

Oh not at all. It can be FoundationDB for sure. That was dismissing gen_server as a proper database.

Now I understand the idea further, thanks a lot for taking the time to explain!