Hi, it’s a good DB, I’ve read almost all of the storage and transactions code (except the constructor) and I find it much easier to read than other databases! Here are my questions:
From README
Hobbes will be closed to public contribution for the foreseeable future.
Why? Can I participate
with some PRs please?
Hobbes is currently proprietary as it is not intended for public use.
You’re missing a LICENSE file though. Copyright must declare an owner at least
From architecture/overview.md
Hobbes was designed and built by Corporate to meet our unique needs.
What Corporate? Is it a name of the place you work at? https://corporate.fm ?
“our”? How many people are working on the project? Just curious.
As far as I can see, the storage path works like this:
- Mutation gets into MutationLog
- HybridKV pop from mutation log on flush
- This data is inserted into storage kv during flush
- Storage kv is dumped on disk during commit during flush
And I see these problems:
- Mutation log is in-memory, if it dies, data is lost
- Storage kv always keeps everything in memory, never reduces size
- Full kv dump is async
- Full kv dump is slow
- In case something fails, transaction is lost and user is not notified about it
I know that this is work in progress, so I am interested in the approach you’re planning to take for storage part. Is it going to be a B-tree variation or maybe some LSM or some new approach?
Looks like it’s not going to support arbitrary terms as keys. Why?
Don’t tell me about term_to_binary ordering, I wrote a nanolsm just to prove that storage can have arbitrary terms as keys with range reads and everything
I wrote a thing similar to Construct, but it’s a WIP thing, so I haven’t released it yes, and it uses a similar API, but slightly different approach. Instead of fuzzing, it traverses all possible paths of how things can happen in distributed system (with or without failures).
Idea behind it is this: we form a tree, where each node is a set of actions (like “send this message to some process” or “receive this message in this process” or “unblock this process after send”, etc.) which can happen at some moment, and the edge is the action which is executed which leads us to the new set of actions. Root node is formed when all tracked processes are stuck in checkpoints (like after send or right after spawn, etc.). When we execute some action, we move down this tree. And each new run is trying to traverse the new path of the tree (starting from root).
The main problem with this library is the action tree increasing in size, since even simple GenServer.call can lead to many outcomes.
I can upload the code if you want to see it.
I’ll ask more questions once I finish reading and testing it
Congrats on the initial release, I can see in git history that it took one and half years and it’s a lot of progress in this amount of time. Frankly, I am almost jealous (in a good way) of your productivity
. I can’t wait to see newer iterations of the project.