Ecto_foundationdb - An Ecto Adapter for FoundationDB

This was not my original motivation, but I could see it being handy. I haven’t written a higher-level layer yet so I don’t know exactly how often it comes up. Probably the reason you haven’t run into it is that you are specifically pushing responsibility for further filtering/sorting on to the user (as opposed to having a query planner), so you can always take advantage of FDB’s sorting. But I do wonder how often I will need to compare a keyset without having already encoded the underlying binary.

I know erlfdb has an FDB term comparator somewhere.

Well my original thinking was that if you always have a schema with fixed types this doesn’t matter. FDB presumably had the same thinking as they don’t mix floats and ints.

But, actually, I am very interested in providing a type system with union types, and it would be quite elegant if an index on an integer | float field could be built. I suppose you could still get away with coercing the integers into floats in the index.

I’m torn on this, I’ll have to look into it. The varint encoding from FDB is very elegant and useful so I wouldn’t want to lose that.

The sext library (yeah) seems to handle this by encoding a fraction after the integer part somehow. I’ll stare at it some more until I understand it.

Yeah, this is exactly the reason I think K/V is a bad abstraction (as discussed previously). I think at this point I’m just going to go for “record store” branding and skip the term “key/value store” entirely (even though the underlying internals are still ordered binary/binary). My only worry is people might then expect a schema, but I’ll work around that. “Schemaless record store?” “Semi-structured”? Idk.

2 Likes

I’ve been idly thinking of a name for these types of systems since database feels the too much and storage engine feels like too little. Lately I’ve been fond of “State Engine”. The combination of MVCC storage server and serializable client-side-compute behaves conceptually as one giant GenServer and many small GenServers at the same time.

2 Likes
1 Like

The native tenancy feature has met its tragic end in an excruciating 100-commit PR. Shoutout to the one guy at Apple who seems to be single-handedly cleaning up the FDB codebase. Honestly, I think it would be easier to rewrite the thing in a real programming language, but realistically what do I know.

I had long thought Apple actually used the tenant features, but I guess it was actually a Snowflake thing. Presumably they prefer to manage tenancy in the record layer. I do think the idea of having a sanity check for commits that cross tenant boundaries has merit, though the FDB implementation seemed excessive. They kept a tenant id => keyspace map on the CommitProxies and Storage servers and used it to validate the id directly. Perhaps just validating a shared subspace prefix would be enough.

The native tenancy was the closest thing FDB has ever had to actual access control, though, which is an interesting premise. RIP.

1 Like

Man I knew it was on the chopping block but I’m sad to see it go. I suppose I have some code of my own to delete too.

Luckily a while ago I switched the ecto_foundationdb tenants to simply be implemented by the Directory Layer instead of the doomed managed tenants. So any ecto_fdb databases should be future proof unless they explicitly chose the experimental ManagedTenant.

I would have loved to see Tenant-aware sharding and encryption. It would make for a powerful security/compliance story.

OTOH, I share your appreciation for the project continuing to move forward in the open and clean out some of the cobwebs.

1 Like

Good call! I would probably have done the wrong thing here as I really did figure it was an Apple feature. In hindsight I’m sure if you tracked down the commits it was all Snowflake engineers (who are now long gone). Prime example of FDB doing exactly zero communication with their community lol. (Obvi I am not blaming the individuals; they are quite literally just doing their jobs.)

It’s not clear to me from your phrasing if you’re aware of this or not, but both of these things actually did exist half-baked in the implementation.

Per-tenant encryption is kinda weird. For one it was at the page level and of course the database is multitenant so there’s no guarantee that pages are cleaved on tenant lines. I got the impression that pages including multiple tenants just… weren’t encrypted that way? It was weird.

Cleaving shards on tenant lines is more interesting because if you have a lot of tenants that are small (like Apple does) then it can probably be done without screwing up the shard size variance too much. The advantage would be to avoid “unlucky” tenants that get split across multiple shards, which I would imagine in practice would increase p99 for random users (persistently, until they get re-sharded) which is probably not great.

Overall the whole access control thing is an interesting angle. FDB’s security model is just like Erlang’s (that is: there isn’t one) and for essentially the same reasons. I have gone back and forth on whether this is a good idea or not in both cases, and honestly I’m not really sure. I’d be curious to hear what Erlang experts think as they’ve probably been debating this for 40 years lol.

If people have deep enough access to your server then you’re f. anyway, also security is about minimizing risk, not completely avoid it, which is impossible. How are you going to stop the NSA from tapping cables.

Hi everyone,

EctoFoundationDB v0.6.0 is published. The focus of this release is a new module called EctoFoundationDB.Sync that represents a milestone in the read-path Sync Engine that I’ve been documenting in Livebooks over the past 9 months or so.

Inspired by what Phoenix.Sync has done for Postgres users, EctoFoundationDB.Sync offers a similar batteries-included syncing experience for those of us that wish to use FoundationDB for their apps ( => me!). The guiding principle is to declare the queries upfront and have the assigns automatically update, without PubSub.

Here’s an example LiveView showing how we can manage various syncing operations on the database as the user navigates on the page (handle_params). The “magic” auto-updating is done via careful FDB watches and LiveView’s attach_hook.

defmodule DemoLive do
  use Phoenix.LiveView

  alias EctoFoundationDB.Sync
  import Ecto.Query

  @query_catalog from(p in Product, order_by: p.name)
  @query_reviews from(r in Review, order_by: {:desc, r.inserted_at})

  def mount(_params, _session, socket) do
    tenant = Tenant.open!(Repo, "sync-sample")

    # :catalog drives our navigation bar, allowing the user to select a Product
    {:ok, socket
      |> put_private(:tenant, tenant)
      |> Sync.sync_all(Repo, :catalog, @query_catalog)}
  end

  def handle_params(%{"id" => id}, _uri, socket) do
    # When the user selects a Product, it's loaded in :product and its Reviews in :reviews
    {:noreply,
      socket
      |> Sync.sync_one(Repo, :product, Product, id)
      |> Sync.sync_all_by(Repo, :reviews, @query_reviews, product_id: id)}
  end

  def render(assigns) do
    # ...
  end

  # That's it! Really -- nothing more
end

There is yet another Livebook that demonstrates the capabilities end-to-end: Sync Engine III - Batteries Included. (Livebook is awesome btw!)

The Livebook includes the LiveView shown above as well as one that is more sophisticated using LiveComponents.

I’ve really enjoyed the database / data management discussion on the forum lately. And especially the new projects that you all continue to contribute to the community. I do a lot of lurking, and not a lot of responding, so I wanted to thank all of you for sharing your thoughts and your code.

4 Likes

Great work as always (and great docs as always)!

Help me understand the guarantees here. It seems like to sync a collection you add a watch on a special (per-tenant) metadata key that is updated along with any row in the “table” (using atomics presumably). So you are guaranteed to observe that an update to a collection has occurred.

FDB does not pass along any information about what has been updated, though, so you cannot incrementalize this. Therefore you must re-run the query as a whole? When that happens you get strict serializability guarantees so the collection will reflect the update and will be a consistent snapshot.

If you have multiple syncs, you have multiple watches which fire independently. However, if you re-run the query for each particular watch you will observe changes out of order (tearing). If you re-run all queries when any watch is triggered you can maintain a consistent snapshot, but this comes at a further performance cost because there is no incrementalization. But I think you’re taking the former approach based on the docs?

I will warn you that dealing with this tearing behavior from the application side quickly becomes maddening. I did this for a while with LiveView and PubSub and that’s how I ended up with such unreasonable opinions about consistency. It’s actually much worse than I had thought at the time, though.

I don’t think FDB can do better than re-running all queries every time anything changes, but that’s fine for many applications (especially since everything is tenant-scoped). I guess you could write all changes in duplicate to a versionstamped log (event sourcing!), but then you have to clean it up and that’s a whole thing.

1 Like

Thanks for taking a look, and the kind words!

You nailed it exactly right, but I will add one point about the new watch that is created after the read.

When a key changes (e.g. one of the metadata keys as you mentioned for a collection), all futures corresponding to the watches on that key are resolved. So if we have 5 LiveViews active on a particular collection, we have 5 watches on a single key, and each future is resolved independently. Then it’s up to the client to decide how to retrieve the new data. In the case of EctoFDB.Sync, I chose to have each listening entity do their own new read of the data. There are no guarantees on the read version for this specific transaction.

However, a new watch is created in the same transaction as the read. So that if the key were to change again after that read, the same entity would be notified again to perform yet another read, and yet another watch in the same transaction. Presumably, the key will eventually stop changing and each entity can cease querying the database. I believe this approach to creating a watch in the transaction does provide a guarantee[^1] that each listener will eventually (scary word!) be consistent. Each page will not get the same GRV, but they should have the same bytes from the key-values once it settles, assuming I implemented this all correctly.

I’m evoking eventual consistency here because in this discussion we’re including the LiveView and the browser itself as a participant in the database. I’m not sure what it would look like to have multiple pages having a truly consistent view of the data and with point-in-time guarantees on their agreement. Perhaps some coordination of a GRV across pages. Have you come up with some other approach here?

On the question of performance, EctoFDB.Sync is indeed relying on the database to be able to perform well having many concurrent transactions performing identical key reads. So far I’ve seen acceptable performance from FDB without having to do some sort of caching layer. FDB is supposed to live very near your application anyway, so caching would be a design mistake, IMO. But my EctoFDB use cases are very light workloads so far. Will this approach scale? Time will tell.

[^1] Note that if a key changes from A → B → A quickly enough, then the watch will not fire, and stays unresolved. In such a case, any listener still has the correct data anyway.

1 Like

Oh I think I left some ambiguity on the table by mistake. I don’t mean multiple pages, I mean multiple queries on the same page. An example would be better.

Say we have a page with authors and books. We sync all authors and books to the client, with two sync_all() calls, and render them like this:

<div :for={{_id, book} <- @books}>
  <div>Title: {book.title}</div>
  <div>Author: {List.keyfind!(@authors, book.author_id, 0).name}</div>
</div>

(There are some who will suggest that you join in the database here. Ignore them, ngmi)

You have probably already spotted the problem: we have no ordering/frontier guarantees here, so @books can update before @authors and tear. If a new book is added with a new author, we will crash because the author is not known when the book appears.

If you are smart you can imagine some simple hacks to get around this, but if you are wise then you will recognize that this is the road to hell. And I have been there.

What you want is for the changes to arrive in versioned batches so that you can always stitch together a consistent snapshot of a page. This property is not called eventual consistency but internal consistency, and I first learned its name from that article I linked. (Before I knew what it was called I had also made up a name, “externally-consistent snapshot”, which is amusingly an oxymoron.)

However you have neither changes nor batches and you are not incrementalizing anything because FDB won’t let you, so all you have to do is re-run the queries when something changes. But, and this is what I was getting at, you need to re-run all queries on the page within a transaction to get a consistent snapshot.

This has performance implications but it’s not the end of the world, particularly if your app naturally shards into tenants (which are conveniently disjoint in their queries).

1 Like

Ah, thanks for clarifying, apologies for my misunderstanding.

You’re right that the entire books+authors query would have to be re-run in a single transaction to get internal consistency. This query would be for all intents and purposes equivalent to a select+join, as far as I can tell, with some watches thrown in.

def my_query(tenant) do
  Repo.transactional(tenant, fn ->
    [authors, books] = Repo.await([
        Repo.async_all(Author),
        Repo.async_all(Book)
      ])
    aw = SchemaMetadata.watch_changes(Repo, Author)
    bw = SchemaMetadata.watch_changes(Repo, Book)
    {authors, aw, books, bw}
  end)
end

Then more code to handle resolving the watch and run my_query(tenant) again.

EctoFDB.Sync’s functions don’t currently help in this case. The API hides the transaction behind a nice simple function call. And it promises to create all the correct watches for you, but only for certain query types.

However, I see no reason why the approach itself couldn’t be used with a little extra work from the developer.

Thanks for bringing this up, it’s important for me to recognize and call out the limitations of the sync features.

There is one thought in the back of my mind while I was writing this post: I suppose anytime one tries to create query conventions on these transactions, one risks creating a new query language (please no).

1 Like

Sync looks cool. Can you comment on the scalability of it? I don’t know anything about FDB watches :slight_smile:

Does this mean it works only with LiveView?

Go on…

Thanks!

Not really, not yet at least. I don’t have experience with heavy watch usage in a production setting. I am using it in a tiny production project called LiveSecret (GitHub | Production App).

I can refer you to the FDB docs that discuss where they have set their default limits (weirdly located in the python client docs).

By default, each database connection can have no more than 10,000 watches that have not yet reported a change. When this number is exceeded, an attempt to create a watch will raise a too_many_watches exception. This limit can be changed using Database.options.set_max_watches(). Because a watch outlives the transaction that creates it, any watch that is no longer needed should be cancelled by calling Future.cancel() on its returned future.

Sync was written with LiveView in mind, but it can be used without LiveView. It does require that your process state that get passed in is:

  1. a map
  2. the tenant is stored in a :private map with key :tenant
  3. the assigns are stored in an :assigns map
  4. Sync also stores it’s own internal data in a key :ecto_fdb_sync_data

If you’re ok with these requirements, you would simply have to call Sync.handle_ready/4 from your process’s handle_info, or equivalent. The documentation is written in a LiveView-focused manner, but there’s nothing LV specific here.

2 Likes

I was trying very hard not to go on but you know I cannot resist.

Keep in mind we are talking about sync here, so we’re actually talking about incrementalizing the join. To perform an incremental join you have to maintain the join on the client, so from the “database” perspective you are not going to get out of that. Of course you could abstract this away into a library, but it’s still there.

From the application perspective you also want access to the underlying collections (books and authors) because, in real apps, there is probably another widget on the page somewhere with a list of authors and a list of books and a list of authors with books and so on. This is just how real apps work; they often show multiple views of the data at once.

The “traditional” SQL backend approach here is to query the data multiple times up-front (select * from authors), (select * from books), (select * from authors inner join books...). But how do you incrementalize these queries? A change for a particular book comes out of the WAL one time, and you have to do the work to figure out which queries to update for that row.

If you spend enough time on this topic you will eventually come to the conclusion that what you actually want to do is just select the entire database and then query it locally, and we call that “local-first” (which does not necessarily mean offline, btw). The difference between theory and practice here is where you draw the lines of what “entire database” means.

I am just going to make an informed guess here because I am too lazy to read the “docs” (as if), but the watch feature is probably just a hash table of key => watch on each storage server that pushes an update to the client when the key is modified. Changes stream into the storage servers from the WAL so it would be easy to check them against the hash table while applying the WAL to the ptree.

There is an edge case where the read version of the watch is lower than the latest write at the time the server receives it, so you would handle that by checking the ptree and immediately resolving the watch if a write is present with version > read_version. This also implies that watches must be subject to the mvcc window.

Assuming the above (which is the obvious design) they should scale horizontally. Idk how many watches a single storage server can take in practice, though. Sockets are not free.

The unfortunate bit is that when a watch resolves you must re-run the entire query from scratch, but Jesse has done the best he can here. FDB just doesn’t have tools to parse the WAL like e.g. Postgres does, so there is no good way to incrementalize queries. This is one of the reasons I started writing a new database, because even though FDB could easily add incremental support I really do not think Apple cares at all.

1 Like

v0.7.0 is released. This is primarily a large maintenance release and a hopeful step toward v1. There are various changes detailed in the changelog.

From a project stability point of view, the most notable change is a internal refactoring of Query and Future to support this innocuous-sounding bugfix:

  • A :limit in Ecto.Query will now work as expected when encountering objects split across multiple keys.

The keystone to this refactor was embracing the iterator pattern (helped along by @garrison’s Iterators on Iterators). Previously, we had been trying to use Stream. However, we needed partial evaluation with later continuation to control the retrieval of data before crossing a transaction boundary. Once we tried with an iterator, everything clicked in place. And it’s trivial to create a Stream from an iterator, so we still get the beautiful Stream API when it’s safe for us to do so.

2 Likes