I wrote this somewhere before, but the funny thing about quick is itâs not interesting at all. Apple just implemented the dumbest possible queue on top of a really, really good database.
The hard part of implementing something like a quick queue is keeping the multi-level index in sync. Except itâs not hard at all, because you can literally just write the two completely disparate index rows along with the user data in a single atomic transaction with perfect consistency, with zero effort, because FDB. That is the power of actually taking the time to design a system with useful guarantees.
I mean I donât speak for anyone else, but Iâll probably generalize versionstamps by just implementing them exactly like FDB because theyâre pretty good. For a literally-serialized system like SQLite any counter will do.
Oh, so thats a replicated state machine then. And this one is backed strictly by Foundation DB. There were some other projects which did similar replicated state machines on other databases, however I canât remember any names
Thats better, but still, I donât understand why input must come from messages. If you had input provided into the state machine by a function call (not GenServer.call or cast), you could atomically add batches of input, you could perform dirty actions without this returning-the-closure pattern, you wouldnât have a problem of some unexpected casts arriving, these messages having temporary data, etc.. And even if user wants to have input coming from messages, they could write their own wrapper GenServer which would have very explicit control on what gets added to the replicated state machine input queue and what gets ignored.
Iâd suggest something like FDBReplicatedStateMachine or fdb_rsm_server or FoundationReplicatedServer, because it describes what your program does. Key words here are âReplicatedâ, âState machineâ and âFoundationDBâ
I am still reading the library code. So far, there is a lot of room for optimization. For example, there is unnecessary double await in call. There is case dgen_queue:length(...) of 0 -> code, which can be optimized to not compute length and just check if empty
There are also some strange design decisions. For example, if dgen_config:init is not called, config will work, it will just ignore user provided values. And it makes it impossible to have two DGenServers with different backends.
Next thing is that current dgen_backend behaviour simply matches erlfdb interface. I think that you should limit a behaviour and make it more generic, cause not all distributed databases support futures, directory and keyspace operations. Otherwise you wonât see any other backends in the future. For example, this sophisticated state encoding/decoding approach youâre using is only a subject to erlfdb implementation, because If I were to implement a postgres backend, I would not need this encoding approach
Also, I think @Asd is probably right about the name making no sense, but there is a strong counterpoint to be made that calling the library âdegen serverâ is extremely funny.
w.r.t. the double await, I think youâre talking about dgen:call/4. The first wait is to a BEAM process. This puts the message onto the durable kv-queue and returns a sentinel key from which the caller can receive the final result. Given the current design, this is necessary because the caller doesnât know the details about the queueâs identity. Via regular BEAM message passing, DGen allows anyone to push a message, as long as they have a pid, or can look one up. I could have instead chosen to represent the queue details in a struct that the caller must have in order to push. This choice would violate the premise, which was to mimic the GenServer interface, because I like it and find it useful for composing programs. You may disagree with the premise, which is fine, but this is not an unnecessary action.
The second wait in that dgen:call/4 is receiving that final result, which comes from the server-pushed resolution of the watch future. In this case the message comes from the storage backend rather than the DGenServer process. The result of the operation is then retrieved.
This is not correct. The length of :dgen_queue is computed by the difference of two values in the kv store = (number of pushes - number of pops). It does not have a key that represents âemptinessâ of the queue. Adding such a key would force us to add more key conflicts to the push and pop functions. This would likely slow them down. As it stands now, we retrieve 2 values concurrently.
Youâre right. Itâs awkward and wrong.
I tried to be clear about this in the post: I donât know how to put another backend in here, but I desperately want to, and the interface of :dgen_backend will definitely have to change a lot.
So why do this at all? I find FDB Layers useful. They can be composed into higher level abstractions, and it results in the most ergonomic state management Iâve ever worked in. However, they necessarily tie you to FDB. While I happily run FDB, I donât want to forever. Since there are other great projects developing that are inspired by FDB, I hope DGen becomes a real Layer that can be compatible with those projects. A Postgres backend is not interesting - the community already has Oban.
At risk of being overly pedantic, Iâm going to challenge this, but only slightly. The original idea of supervisor is to do this, but gen_server itself is not opinionated about how, when, or why itâs restarted, or if it is at all.
Of course, the design of gen_server is amenable to being used by the supervisor in a powerful and useful way, just like you describe. Iâm a direct beneficiary of the genius design of this simple idea.
DGenServer breaks the rules a little bit. It can still be stopped and restarted by the supervisor, but if a poison message is the result of the crash, it may very well require an operator to intervene - either by correcting database state, fixing a bug in the code, or changing some upstream service. I agree this is a weakness in the design.
This is dismissing FoundationDB as a proper database. Why?
This is a prime example of why asking a one-line âwhyâ question in a forum is such a bad idea. I am sure both of you have good intentions, but because of the lack of common context, trading a bunch of short âwhyâ questions will only steer the discussion further away from truth seeking.
Indeed, any stateful program will be at risk of persisting a bugged state. This leads to the uncomfortable realization that one of Erlangâs core ideas is probably wrong, or at least inadequate for large swaths of real-world programs. Aggressive correctness testing (FoundationDB is a good example) is a more fruitful path to ensuring that such states are unreachable.
Another fruitful path is to structure your code in such a way that bugged states are less likely to arise. Programming in a declarative style, where the program rebuilds its state by re-executing itself from the top rather than transitioning between states through piecemeal manipulation, is a helpful strategy. OTP supervisors offer a form of this, but they are fairly primitive. Reactâs engine is a much more sophisticated tool in this area, as it allows for stateful components with incremental execution and has escape hatches to integrate with non-incremental code.
Something that looks less like a state machine and more like a React (function) component is what I would like to see. But all experimentation is valuable, and whatâs special about the âlayersâ paradigm is that it enables experimentation. You do not one-shot great tools, they are evolved.
How did you come up with that idea? The point of supervisors in Erlang is that if there is abnormal state, then it is better to restart and start again with known state. Just like restarting a computer with borked state in RAM.
So you do not restart to previous state, as this indeed would be pointless, but you restart to âcleanâ state in hope, that it was one-time error that was out of your control.
The error might be provoked by wrong message sent to the process, not by the project inner state in the first place. In such a case it makes total sense to restart preserving the latest state.
The error might be provoked by a wrong state+message combination.
Andalso, the error might be provoked by the wrong state indeed, but the previous state was all right and it might make sense to rollback to this âpreviousâ state rather than to the point blank one.
From my understanding, the OTP framework designs gen_server (called GenServer in Elixir) to act as both a server and a client, similar to the TCP server/client pattern. It maintains state, but developers need to handle persistence themselves if they want to restore the state after a crash. It also receives configuration/options from a supervisor (or via manual start) and initializes its state in the init callback.
I think this can be annoying for developers when handling state in some cases.
In my opinion, a GenServer with a pluggable architecture, similar to Phoenix, would be more flexible and better aligned with the Elixir style.
Another interesting component is gen_statem. It is quite suitable for working with state transitions. I saw that Elixir implemented it in the early days, but it seems to have been discontinued.
For DGen, at first glance, it looks like a remote GenServer running on another node. If the goal is to share or persist state, this could be achieved by adding an adapter layer (similar to Plug). I think this would help users support more use cases, such as storing data in Redis, Postgres, etc.
Of course, but that isnât something that OTP provides for gen_* modules for you. If you need such behaviour, then it is up to you to decide what âcommitted dataâ is and how tell user it was committed at all.
All Iâm saying is that âturn it off and on againâ is inadequate for maintaining availability in a persistent system because you will either persist the bugs or lose data, neither of which is acceptable. You need correctness testing.
Of course you need correctness testing. However, it is a chicken and egg problem in the real world: what do you do before you reached absolute correctness? Nothing? With OTP you can at least limp on and monitor the log file, find out what went wrong, add a test case and fix the bug for good. It gives you a path to correctness but not the correctness itself.
Some stuff may be acceptable in some cases. The perfect example there will be telephone switcher (what a coincidence):
In case of telephone call if there is a bug in software, we want to reduce impact on the overall system. If that was one-off issue, then the callers will call again and âsomething brokeâ and everyone will go back to their lives. But if bug in single process (call) can cascade to other calls, then it is highly undesirable.
Similar thing with HTTP services, if there will be some issue on the line, then user will simply hit ârefreshâ. If that issue isnât common, and was one-off, then no-one will notice that (browser may even refresh on its own in some cases).
There is a lot of systems (especially related to network), where simply restarting process (often even not needed to be done automatically) will be enough for a lot of error handling in case of one-off errors.
I think that youâre missing the point. Some time ago, Iâve seen your Peeper library which stores the state of some GenServer in ets and loads it back when the GenServer restarts. It got me thinking about it and I decided that this approach is just reinventing the bicycle
You are completely right that restoring the latest correct state after the crash is the best option. However, it is not the best assumption that this latest correct state is the state which the GenServer was in right before the crash or before it received the message which crashed the GenServer. And even more, there is no generic answer about how to decide which state (the GenServer is in at some moment) is correct and which is not.
Thatâs why GenServer has callbacks. Namely init/1 is the callback which executes some code which has to recreate some state which is correct for sure. This approach is generic, because it imposes no expectations and lets the developer decide which state is correct and which is not. If you have a bug in init/1 which returns the incorrect state, then your server will restart with an incorrect state, but thatâs just a one callback, and its a callback, a function, which may return different results. That means, that GenServer will recover if init/1 returns a correct state at least once.
Rolling back to some of the previous states will impose the hard requirement onto the developer, who now needs to write the code in a way, that no handle_* callback ever returns incorrect state. If it returns incorrect state once, youâre forever stuck with it. Otherwise every crash will restart the server with incorrect state, thus indefinitely persisting the incorrect state without any chance for automatic recovery. That means, that GenServer will recover only if all callbacks return correct state all the time.
I think that not all the people whose opinion differs from yours are missing the point
If there was a rock solid solution allowing the developer to properly recover from anything, preserving a proper good state, itâd been incorporated into OTP, Iâm 102% positive. Obsiously, there is not such a silver bullet.
It does not mean the developer cannot narrow their usecases to some less general surface. For some cases, like aforementioned âwrong message, correct state,â Peeper just does everything right. If I can ensure that my code does not corrupt the state under any circumstances, Peeper would have a lot of hassle prevented. Does it work for everyone under any circumstances?âOf course not. Small libraries are not usually cover each and every need of the depeloper all across the world, standard lib (OTP) does.
If I knew that, wouldnât it be simpler to reject the wrong message and keep the GenServer humming?
You library might be useful for cases like Liveview, where the life span of the process is tied to the health of the socket. However, in the case of LV, the restart of the process is triggered by async user action, not a supervisor, so there could be race conditions between the serialization and de-serialization, thus corrupt the state for good?