Hobbes - a scalable, fault-tolerant transactional record store

Actually it was ExESDB that I was thinking of, but I couldn’t remember the name in the moment so I just used Commanded for the example. ExESDB is a perfect example of something that you would implement with Hobbes.

It probably needs a more memorable name, though…

1 Like

Just lurking because minimal to 0 relevant experience but love an ambitious project. But I couldn’t stand by to let unit tests be slandered like this. inferior?? come on, they’re a trade-off. you give: moderate extra time specifying input and outputs for a small piece of your API. you receive: assistance designing said api and a fairly strong guarantee future implementation changes don’t break the contract. refactorer’s best friend. This is especially true with web apps where true e2e tests are a huge pain.

3 Likes

I think this may be the classic case of people having different ideas of what the “unit” is. The type of granular tests I’m considering unit tests are the enemy of refactoring.

2 Likes

I had a bug not long ago where one of the fuzzers tripped an assert. Not often, maybe 1-5% of the sim runs failed.

The distributor writes special meta keys into the sharded WAL to inform the storage servers they need to change their shard map and/or fetch keyspace from another server. These special mutations are logged at a particular version (it’s a multiversion database) because they have to happen at exactly the right time. They are pushed from the Distributor, to a CommitBuffer to be batched, and then they are fragmented and shipped to the sharded WAL (TLog servers). The Storage server pulls batches from the TLog servers with a peek call. All of this is distributed and nondeterministic, including how things end up being batched on the CommitBuffer and how those batches are pulled by the Storage server.

When the Storage server applies the batches, it does so one batch at a time. Within the batch it applies one mutation at a time. It can pull a few batches at a time from the TLogs, so this is a nested loop, essentially.

To observe the special mutations, the Storage server has to diff the new value with the previous value, so it has to read the previous value and then run them both through a case statement so that it can discover what to do. For example, if the previous value was nil and the new value was a shard key, it would be a command to fetch that shard. And the inverse, and so on. Some combinations would be impossible because they have no semantic meaning (like nilnil). I am speedrunning this explanation, obviously.

There was a bug in this code. In the second level of the nested loop, I accidentally read the old key at the current version of the Storage server rather than the version being applied currently; i.e. at the wrong level of the nested loop.

This led to a very rare edge case: if the meta mutations were batched just right, there would be two for the same key within the set of batches that happened to be pulled by the Storage server from the TLogs. This is impossibly unlikely in practice because the same meta key would almost never be written multiple times this way. But if the stars aligned, the diff might “miss” a value in between the batches.

The fuzzer that caught this aggressively spams shard moves against the cluster. Even then, only a few runs tripped the assert, because the stars had to align to make it happen. This would be very nondeterministic, but thankfully I spent months reimplementing half of OTP deterministically so that I don’t have to deal with flaky simulator runs.

It was a one-line fix.

Can you write a unit test to prevent this bug?

Actually, do you think you could even write a regression test for this bug?

1 Like

Have you considered that perhaps the fact that they’re a pain is the problem we should be looking into?

1 Like

That’s all super interesting. Genuinely! That’s the kind of stuff I’m in this thread for. But unit tests have their place. Maybe not in your project.

Me? uh, no. i’ve got….other stuff. But sure, I have no doubt they can be improved and will be. LiveView test is a huge improvement over other JS based frameworks (though that is….not saying much). Just there’s no way around the fact it’s more complex, and unless we want to imagine a scenario where the complexity differential is approaching 0, which is certainly not present day, unit tests provide huge value for the cost.

3 Likes

Hi, it’s a good DB, I’ve read almost all of the storage and transactions code (except the constructor) and I find it much easier to read than other databases! Here are my questions:


From README

Hobbes will be closed to public contribution for the foreseeable future.

Why? Can I participate :backhand_index_pointing_right: :backhand_index_pointing_left: with some PRs please?


Hobbes is currently proprietary as it is not intended for public use.

You’re missing a LICENSE file though. Copyright must declare an owner at least


From architecture/overview.md

Hobbes was designed and built by Corporate to meet our unique needs.

What Corporate? Is it a name of the place you work at? https://corporate.fm ?
“our”? How many people are working on the project? Just curious.


As far as I can see, the storage path works like this:

  1. Mutation gets into MutationLog
  2. HybridKV pop from mutation log on flush
  3. This data is inserted into storage kv during flush
  4. Storage kv is dumped on disk during commit during flush

And I see these problems:

  1. Mutation log is in-memory, if it dies, data is lost
  2. Storage kv always keeps everything in memory, never reduces size
  3. Full kv dump is async
  4. Full kv dump is slow
  5. In case something fails, transaction is lost and user is not notified about it

I know that this is work in progress, so I am interested in the approach you’re planning to take for storage part. Is it going to be a B-tree variation or maybe some LSM or some new approach?


Looks like it’s not going to support arbitrary terms as keys. Why?

Don’t tell me about term_to_binary ordering, I wrote a nanolsm just to prove that storage can have arbitrary terms as keys with range reads and everything


I wrote a thing similar to Construct, but it’s a WIP thing, so I haven’t released it yes, and it uses a similar API, but slightly different approach. Instead of fuzzing, it traverses all possible paths of how things can happen in distributed system (with or without failures).

Idea behind it is this: we form a tree, where each node is a set of actions (like “send this message to some process” or “receive this message in this process” or “unblock this process after send”, etc.) which can happen at some moment, and the edge is the action which is executed which leads us to the new set of actions. Root node is formed when all tracked processes are stuck in checkpoints (like after send or right after spawn, etc.). When we execute some action, we move down this tree. And each new run is trying to traverse the new path of the tree (starting from root).

The main problem with this library is the action tree increasing in size, since even simple GenServer.call can lead to many outcomes.

I can upload the code if you want to see it.


I’ll ask more questions once I finish reading and testing it


Congrats on the initial release, I can see in git history that it took one and half years and it’s a lot of progress in this amount of time. Frankly, I am almost jealous (in a good way) of your productivity :smile: . I can’t wait to see newer iterations of the project.

2 Likes

Who is “we”?

1 Like

An honor!

Many different reasons, but for now this is a “just me” thing. Maybe down the road when I have self-hosted Git infra (really hoping ATProto works out) I would consider accepting contributions from a small number of trusted people, if those people are willing to follow some rules. For the time being I don’t want to worry about it.

This is not true (copyright is automatic), but it won’t matter once we move to alpha because then there won’t be any copyright :slight_smile:

It’s just me for now! “Our” is being used in a professional sense to reflect that I consider this to be the work of an organization, even if that organization is currently just me. There will be commercial products in the future; Hobbes is part of the tooling I’m building out to support those products. I am very interested in sovereignty.

And yes that is my (“Our”) website, which is currently just a blog. Which I will be posting to more regularly now, so grab the RSS if you’re interested in that!

2 Likes

This sounds cool and of course I’d love to see it :slight_smile:

Traversing all reachable paths in a breadth-first way like this would probably prevent you from reaching bugs that only occur after some time has elapsed. The fuzzers are essentially traversing depth-first with a lot of noise injected, if you think about it. Good for finding bugs.

1 Like

Your understanding looks correct. This design comes right from FoundationDB, where the in-memory state is multiversion and the on-disk state is not (hence the MVCC window of roughly 5 seconds).

However, I am going to obsolete this entire thing by pushing the MVCC all the way down into the storage engine on disk, at which point this entire path is going to be removed! So enjoy it while you can, I guess.

The WAL data is stored durably on the TLogs and would simply be pulled again after the Storage server reboots. The WAL is not popped from TLogs until after it is persisted by the Storage server.

Because there is no on-disk storage engine; it’s essentially mocked for sim testing. The flat storage would be on-disk. The in-memory storage is pruned by the flush and would not grow unbounded.

I’m not sure what you’re referring to exactly, but if you mean FlatStorageKV this is a mock to test recovery after restarts. It’s not a real storage engine. I was going to write another KV using SQLite (this would be trivial) but I changed my mind and started designing one from scratch.

This could mean a lot of different things. Can you clarify?

A committed transaction would never be lost. Unless fault-tolerance is violated, at which point the database should be considered lost and restored from backup.

Leveled LSM with the versions from the cluster pushed directly into the storage engine in place of the sequence numbers. Deep integration is the goal.

The other big advantage of a custom storage engine is that it will enable more advanced data movement. FDB has in the past performed shard moves “logically” (by reading individual keys) rather than by directly copying the binary pages on disk. They have been moving away from this using RocksDB. I think I can do better because I will control the on-disk format.

I am working on an RFD (request for discussion) for the storage engine. I plan to post the RFDs as forum threads and solicit design feedback, so look out for that.

As we have previously discussed I just think binary/binary is a nice underlying abstraction. There will be binary encodings for both keys and values, but I will design those encodings. The purpose of the value encodings, by the way, is to enable predicate pushdown in the future.

Using lexicographic order and using a custom comparator are essentially equivalent. I know you prefer the latter, but I don’t and I think we have pretty much exhausted the discussion already tbh.

Thank you for sharing your project! I look forward to following it.

Can you comment on the possibility of a public API and separate package for Construct? The idea of DST of BEAM primitives is interesting. Is there a prototypical example using Construct that demonstrates its capabilities?

2 Likes

Fuzzers are different, since they allow parallel execution, and they just tweak timeouts and inject faults

My solution allows only a single process to execute at a time and when it comes to the state where failure will occur (right now it’s only deadlocks, but I am planning to add an invariant logic like in ecosystems akin to Ada’s SPARK), it will write you down a sequences of how processes must switch and messages must travel in order for this specific failure to occur

Here’s the code, I’d say it’s like 50% ready and I am still testing it out on a simple CRDT increment-only counter algorithm to improve both libraries: GitHub - hissssst/dist

The reason why you should use both fuzzer and solutions like this is that fuzzers miss edge cases, but they are many times more efficient

2 Likes

I have had this under consideration from the beginning (which is why it has a name!), and it will probably happen at some point. Construct and Hobbes had to be written together, though.

It is not trivial to write simulation-tested code. Learning to write good fuzzers is a skill which I am still developing myself. And of course the thing you’re testing still has to be deterministic! Which, actually, has not been all that hard, but it would probably be a nightmare to retrofit to an existing codebase.

Still, at least for me I see a path to writing real applications that can be simtested. And at the very least we can have fully integrated simtests for layers, which I know is also your area of interest :slight_smile:

Well, the prototypical example is Hobbes itself! The entire database is implemented using Construct, so you can kinda just look anywhere. If you want to see the fuzzers you can look in the workloads/ directory, but note that the fuzzers actually work outside of the sim as well for the most part. I do run them outside of the sim occasionally just to make sure the thing actually works, but the simulator is just so much better for testing.

Actually, this is something that might be worth mentioning since I know you’re familiar with DST and FDB. When I started down this path I was pretty confident the simulator was necessary, but I wasn’t sure how much of the FDB testing “story” was just hype and so on.

My impression, having been at this a while now, is that it was all true and then some. I don’t think I could live without the sim at this point. The feedback loop is so tight, and deterministic failures make debugging so much easier. It really is that good.

It has me thinking that, some time long in the future, what I really want is a new BEAM with this stuff built-in.

Construct does not allow parallel execution. It’s a deterministic simulation testing framework, so parallelism is a no-go there. Only one process is allowed to execute at a time.

I think the difference in approach here can be summarized as “breadth-first vs depth-first”, but maybe I am not fully understanding.

Either way, I see the similarities and I’ll keep your approach in mind. Hobbes probably has too large a state space to explore exhaustively this way (not computationally tractable), but maybe smaller parts of it could be exhaustively checked for bugs.

Do you see parallels between your approach and formal methods (e.g. TLA+)?

I appreciate the pointer to the workloads directory for examples. I’m reading the code and I’m surprised to find that Construct actually does make use of real BEAM processes and message passing. I had expected the simulation to run in a single real BEAM process on a node with exactly 1 online scheduler. And that Construct would itself schedule its simulated “processes” (you could call them green-green threads?)

With the simulation using more than 1 real process, does this not violate determinism?

Even if the BEAM scheduling is deterministic (I don’t know if it is or not), it’s not reproducible, so the only option is to treat it as as nondeterministic as far as I can tell.

With 1-process-1-scheduler, BEAM will of course still start and schedule all its other system processes, so care would have to be taken to ensure the simulator process would not rely on any other system process.

Strictly speaking, I suppose a reproducible workload could still be simulated on 1-process-n-schedulers, but that appears to complicate the matter without a benefit.

1 Like

It does, but the scheduler intercepts all of the messages. It intercepts a lot of stuff.

Processes are used because they are the only way I could think of to get resumable execution. If you look at scheduler.ex you will see I essentially built a yield statement out of messages and everything else is layered on top of that mechanism.

Ah, but you haven’t noticed the trick! By adding a yield statement to Elixir I have reduced the BEAM’s glorious preemptive scheduler to a cooperative model. Only one process can execute at a time, and they’re all gated by the central scheduler.

Horrifying, isn’t it?

1 Like

I doubt you could ever rely on the BEAM to be deterministic, but putting that aside another nice property for the simulator to have is stability.

One of the most useful features of the sim is that, when a run fails, you can actually go back and instrument the code with extremely specific log statements and other checks. Because of how Construct is implemented (like a cooperative scheduler), executing those log statements does not throw the run off of its path.

Even if the BEAM scheduler was deterministic the extra reductions from the log statements would destabilize it. Perhaps this could be made up for with fancy debugging tools, but I do enjoy my print statement debugging :slight_smile:

1 Like

I can’t really contribute to the discussion as this flies over my head in terms of knowledge, yes, I did read “Designing Data-Intensive Applications” and had Distributed Systems courses back in Uni times but DB stuff is still pretty much an unknown domain to me. :woozy_face:

Anyway, just wanted to congratulate you for the release! Been digging through the source code, pretty cool stuff. :slight_smile:

Also, the first Database should have been called Calvinex! :sweat_smile:

1 Like