Bedrock - a scaleable, distributed key-value database with better-than-ACID guarantees

garrison · September 9, 2025, 3:38pm

I understand now, thanks for clarifying. I don’t know the term_to_binary encodings but if it’s easy to pull the raw binary out with a quick pattern match as you suggest then I agree this could be a reasonable approach.

Asd · September 11, 2025, 4:21pm

I made a very simple LSM and I will try it out later today with Bedrock. Is there any bedrock benchmark or end-to-end test where I can check that Bedrock is working correctly with my storage?

garrison · September 14, 2025, 6:13pm

Looks very nice! You should make a thread for it. There are questions I would ask but I don’t want to clog up this thread

jallum · September 15, 2025, 3:05pm

Take a look at storage.ex for the basic interface (all read-only) that the transaction builder expects, and you’ll want to look at the existing storage implementations for how to manage log-pulling. Storage servers in Bedrock (like fdb) only process transactions that have already been version-stamped and conflict checked, so all you need to worry about is applying those transactions and serving versioned reads.

I checked out the lib, and it’s a great start! I love building this kind of stuff!

At the moment, I don’t have any official benchmarking tools. With things starting to stabilize in 0.3, that’d be a good area to look into (along with automatic sharding, watches, better/simpler cluster config, documentation, etc.)

garrison · September 15, 2025, 4:14pm

Thoughts on the API after reading the (brand) new guide:

I don’t like the tx being passed into the transaction fn. The API is not pure (get() and put() do not return a new transaction) so it feels weird to me. This is especially apparent in the composition example, where the tx is ignored and the called functions then get their own tx (from where?). Why is the explicit tx needed at all if the API is clearly not pure anyway?

You have clear() and clear_range() and then get() and range(). Why not get_range()?

I’m not sure how I feel about Subspaces. I know this comes from the FDB API but it seems like they exist solely to pack a tuple with a prefix. What purpose does a Subspace serve that is not already served by an arbitrary prefix tuple? I might be wrong about this one; I’m curious what @jstimps thinks here.

I think Repo.transaction() should mirror the new Ecto Repo.transact() API where the anonymous function returns :ok/:error tuples to be used for commit/rollback (of course here a rollback is a noop). It composes better, no need to repeat past mistakes there.

Have you given any thought to automatic retries for errors (e.g. conflicts)? One footgun is that some errors may not be retryable (network timeouts and recoveries). FDB had some work on automatic idempotency which was quite interesting. I personally plan to implement idempotency so that I can check linearizability after faults. (See also Tigerbeetle’s end-to-end idempotency, which is clever.)

jallum · September 15, 2025, 7:28pm

Feedback! Yay! Working against this demo/tutorial uncovered a lot of problems, both big and small. Big thanks to @jstimps – his version for :erlfdb was the kick in the butt I needed.

This is a good point. Inside a transaction, there’s no real reason that it has to be passed around. It’s in the process dictionary… and that’s the way that transactions are automatically nested already… so I could easily remove the parameter from both the transaction function as well as the get, put, etc. functions. It’d certainly be less typing. I’ll give that a whirl and see what that looks/feels like.

Hahaha. Because I’ve been staring at it from 2" away, and it’s hard to see these things sometimes? get_range has a certain symmetry. That also sounds like a good tweak to make. Consider it done.

Yeah. I straight copied the notion in my drive to duplicate the demo. Now that it works, it’s time to step back and see if some of these ideas can be made more “elixir-ish” I was thinking of reworking Subspace as a protocol, defimpl’d for binary()… it maybe could use a better name.

Up to 0.3, I was trying to do this very thing with transactions… but I came to think that trying to divine the will of the caller by picking apart the result is just kind of dicey. I can think of all sorts of reasons for wanting to return an error without rolling back. (For example: say i want to try to do something… but fail, but I still want to record the attempt.) The demo shows a raise for "No remaining seats", but this could just as easily be a returned result.

Current behavior is to commit on any normal return without the result being interpreted in any way, and to only roll back on exception or explicit call to Repo.rollback(reason). My thinking here is that if the user wants to rollback, they should just do that. I think this make a lot of sense in context, because a rollback or a no-change commit literally costs nothing, whereas a normal db, resources are tied up by the transaction and network round-trips are required to tear that down.

Yes. This is already in place, for certain classes of “retryable” failures, notable examples being transient errors like: version_too_new or unavailable for reads and of course aborted (due to MVCC conflicts). There’s a scaled back-off (0, 2, 4, 8ms… up to 1s) with jitter. It was pretty much a requirement for the ‘concurrent’ portion of the notebook. It’s also how you can just “run” the notebook and the first get(tx, "hello") works without having to explicitly wait for the system to spin up.

garrison · September 15, 2025, 8:05pm

Repo.transaction(fn ->
  # ...
  {:ok, {:error, "No remaining seats."}}
end)

Read as “transaction succeeded, return value is an error”. This is fine, I think.

This is how Ecto worked but we seem to have (mostly) reached community consensus that it was a mistake and that’s why there is a new Repo.transact() API. There are a couple of individuals on here that dissented, though, so maybe they’ll poke their heads in and disagree

The explicit rollback is essentially using try/catch error-based control flow instead of if/case/with pattern matching control flow. The problem is that in Elixir we generally strongly favor the latter and rarely use the former, so building such a fundamental API (Ecto or Bedrock) on top of errors feels wrong (if we had algebraic effects maybe this would be different). If you use error tuples everywhere you can just use a with pipeline, whereas the rollback-based control flow resulted in the rather byzantine Ecto.Multi situation. This article has a nice visual example.

dimitarvp · September 15, 2025, 8:34pm

Nope, I am quite fine, thank you.

jallum · September 15, 2025, 10:00pm

I suppose… if I squint a little.

This is what you get when you use the old Repo.transaction with ecto and return an error… if you return {:error, "No remaining seats."} it wraps it with an {:ok, ...}. Repo.transact makes you do it yourself, in order to commit the transaction and return the error, just returning the error gets you a rollback.

…and how it still works with respect to rollback/1. From within a transact the returned value for Repo.rollback(:reason) is {:error, :reason}. The old transaction behaved the same way with respect to rollback/1, but annoyingly wrapped whatever you returned from your transaction function with an :ok tuple, so return your_result and you get {:ok, your_result}… (return :ok and you get {:ok, :ok}).

The fact that the new transact function just leaves your result alone is kind of a cause for joy, though it seems that there’s no (documented) way to just return :ok to commit without a whole tuple.

Ultimately, I don’t feel super strongly one way or the other and I’m going to lean heavily in the direction of established patterns. It’s really only a couple of lines of code to do things the way transact does. ~~I’ll probably have~~ I added a commit to this PR.

jstimps · September 15, 2025, 11:15pm

This is a good point, but instead of removing the tx I do wonder how it would feel to lean in to it, and make the RYW tx immutable data. If it can be done with still an ergonomic API, you could get one of the benefits of nested transactions, the ability to abandon a subset of operations in the tx, without the IMO confusing semantics that would come from true nesting. The ergonomics might not be there though, since the get would have to return {value, tx} or similar. I could understand going either way. Immutability has served us well though.

I think it’s an important API concept to the end user; it leads them down the path of understanding the power of the ordered keyset. And each use of a subspace is equivalent to an index that doesn’t have to be written. Although, I admit I’ve only used FDB subspace in production code via the Directory layer.

In a new system like Bedrock, I agree that transact is the way. However, {:ok, {:error, _}} as an idiomatic pattern is not ideal.

For the record though, I went a different third way with EctoFDB, since :erlfdb already had an established pattern embracing throws. Instead of transaction or transact, the EctoFDB adapter creates a new Repo.transactional to match :erlfdb.transactional. I did this because I didn’t want EctoFDB to be subjected to the future deprecation and removal of Repo.transaction.

In practice, the transactional throws have served me quite well in production code. But perhaps this is my Judgement of Solomon moment .

jstimps · September 15, 2025, 11:19pm

Cheers! Always happy to be involved in exciting projects!

Some minor feedback on the API. As a new user, I would be confused by using the Key module to encode a value. An excerpt from the tutorial:

      key = Subspace.pack(course, class)
      value = Key.pack(@total_seats_available)
      Repo.put(tx, key, value)

jallum · September 16, 2025, 12:49am

jstimps:

Cheers! Always happy to be involved in exciting projects!

Some minor feedback on the API. As a new user, I would be confused by using the Key module to encode a value. An excerpt from the tutorial:
      key = Subspace.pack(course, class)
      value = Key.pack(@total_seats_available)
      Repo.put(tx, key, value)

Yeah. That’s been pinching me, too. It’s awkward… but the system already has a Tuple module as part of the runtime, so there’s some name clash at least. It’s not a difficult thing to move or rename… but a good module name… anyone got ideas?

It will happily pack binary() | number() | list() | tuple()

jallum · September 16, 2025, 1:02am

I’m not mad at it… there’s a ton of internal task management that would somehow need to be dealt with… I wouldn’t be able to do anything with timers… right now I can expect to have handlers on the process loop for things like that, but if we went pure functions, that wouldn’t be an option. Hrm…

I just sketched transact semantics out over here. I reworked the tutorials, too, on that branch to give an idea of what it’d look like. I’m not opposed to the idea of letting anything but an {:error, reason} commit… which would give you a little more flexibility to use other result shapes. As it was, in my sketch, I couldn’t resist allowing plain :ok

jallum · September 16, 2025, 3:26am

…and here’s what it would look like if everything were implicit, like ecto:

defmodule Scheduling do
  @total_seats_available 100

  def signup(attends, course, student, class) do
    Repo.transact(fn ->
      rec = Subspace.pack(attends, {student, class})

      case Repo.get(rec) do
        nil ->
          # Not signed up yet, proceed with signup
          class_key = Subspace.pack(course, class)
          seats_data = Repo.get(class_key)
          seats_left = Key.unpack(seats_data)

          if seats_left == 0 do
            {:error, "No remaining seats"}
          else
            # Decrement seats and record signup
            Repo.put(class_key, Key.pack(seats_left - 1))
            Repo.put(rec, <<>>)
          end

        _existing ->
          # Already signed up
          :ok
      end
    end)
  end

  def drop(attends, course, student, class) do
    Repo.transact(fn ->
      rec = Subspace.pack(attends, {student, class})

      case Repo.get(rec) do
        nil ->
          # Not taking this class
          :ok

        _existing ->
          # Increment seats and remove signup
          class_key = Subspace.pack(course, class)
          seats_data = Repo.get(class_key)
          seats_left = Key.unpack(seats_data)

          Repo.put(class_key, Key.pack(seats_left + 1))
          Repo.clear(rec)
      end
    end)
  end

  def available_classes(course) do
    course_range = Subspace.range(course)

    Repo.transact(fn ->
      classes = Repo.get_range(course_range)
        |> Stream.map(fn {packed_class, packed_seats} ->
          class = Subspace.unpack(course, packed_class)
          availability = Key.unpack(packed_seats)
          {class, availability}
        end)
        |> Stream.filter(fn {_class, availability} -> availability > 0 end)
        |> Stream.map(fn {class, _availability} -> class end)
        |> Enum.to_list()
      {:ok, classes}
    end)
  end

  def init(scheduling, course, class_names) do
    scheduling_range = scheduling |> Directory.get_subspace() |> Subspace.range()

    Repo.transact(fn ->
      # Clear the directory
      Repo.clear_range(scheduling_range)

      # Add all classes
      for class_name <- class_names do
        add_class(course, class_name)
      end

      :ok
    end)
  end

  def add_class(course, class) do
    Repo.transact(fn ->
      key = Subspace.pack(course, class)
      value = Key.pack(@total_seats_available)
      Repo.put(key, value)
    end)
  end
end

…and…

def switch(attends, course, student, old_class, new_class) do
  Repo.transact(fn ->
    with :ok <- signup(attends, course, student, new_class),
         :ok <- drop(attends, course, student, old_class) do
      {:ok, :switched}
    end
  end)
end

…other than the Key.pack for values naming pinchpoint that @jstimps called out, that’s all of of the feedback integrated. If any of the implicit functions are used outside of a transact, they’ll raise.

dimitarvp · September 16, 2025, 8:53am

FINALLY. Now I understand one of the benefits of the new thing. Thanks!

Finally an actual objective observable tidbit.

garrison · September 16, 2025, 7:13pm

It’s not beautiful, no.

I’m beginning to wonder if the underlying problem here is the concept of rolling back. What do we actually use rollbacks for? I suppose they are inherently kinda exception-coded.

Maybe this is a bit avant-garde but I’m wondering if maybe it’s better to think in terms of retries rather than rollbacks. I.e. the same argument as “let it heal” over “let it crash”. Ecto has rollbacks (crashes) but it doesn’t have retries (supervisors).

I have begun to doubt my own argument in favor of transact() here I think. Gonna have to spend more time on this one.

I have been doing exactly this for my internal transaction API for a while now and I don’t really like it. The problem is that it’s way too easy to accidentally “lose” the transaction struct (even on reads) and violate consistency guarantees.

Fundamentally interacting with the DB is a side-effect and trying to pretend that it’s not seems to make things dangerous. A pure transaction API allows you to observe the side-effect of reading the data while forgetting to write down that you’ve read it.

I feel like I’m walking on eggshells using it and I wrote the entire database. Given that, there is no way I’m exposing it to anyone else.

It’s funny because this is one of very few cases I’ve ever come across where a pure-functional API is less safe than one which mutates global state. Another one is Plug’s get_csrf_token().

garrison · September 16, 2025, 7:44pm

Hah, I ran into this exact problem and after much consternation landed on Encoding.Keyset. I did consider both Tuple and Key and didn’t like either for the same reasons. I’m still not sure the compromise is any better though (perhaps I am Solomon).

The real problem is that for the value encodings I desperately want to use Record…

jstimps · September 17, 2025, 1:04am

Yes, this convinced me. I can see why the immutable approach isn’t right here. The get is not a get, it’s a get+add_read_conflict. However this is an implementation detail, so in a pure API it becomes a leaky abstraction. Very weird! Thanks for the insight!

Asd · September 17, 2025, 4:31am

19 files changed +2160 -616 lines changed

Average commit above. I think that average is about 2k lines of code change a day. Too bad they didn’t invent vibe reading, cause I definitely can’t keep up with this to read and understand all the changes

jallum · September 17, 2025, 2:15pm

*It’s made with Autism, so you know it’s good.

Seriously, though. I love to talk about this stuff, and I’m more than happy to info dump about any aspect of the code, how it works, why I made a particular choice, etc.

FoundationDB’s design is wonderful, but they’re horrible at explaining how their own system works – and my docs aren’t that much better, yet. There’s a lot to take in… and a lot of interesting ideas in this space… so, ask on this thread, start another one, the elixir slack, dm, whatever.

Every time I’ve had a conversation about this beast, I’ve come away with new ideas, better perspectives and things I’ve learned from the person I’m talking to. There are a lot of smart people here that think differently than I do, and it’s fun and interesting to see this project from their point of view. So, if you have a question, don’t hold on to it!