Ecto_foundationdb - An Ecto Adapter for FoundationDB

Regarding vector Index, my interest is mostly a hope at this point. Also I don’t have the time to fully invest at the moment. :sweat: I’ve captured my initial idea in a new discussion here. I would be interested in hearing your thoughts.

I’d like to continue collaborating on your other ideas too. It’s clear you have experience in this area and a bold vision. I enabled Discussions which may be a more convenient place to follow up on some of these conversation threads on an individual basis.

1 Like

Hi! Your fdb project was very helpful in me getting the erlfdb binding tester back up and running in GitHub CI, so thank you!

One thing that’s unique about developing on top of FDB is that many flowers can bloom, so-to-speak. Love seeing others with interest in the tech. :slightly_smiling_face:

1 Like

I don’t use gh but I will reply here if you don’t mind.

The only prior art I’m aware of for vector indexes on FDB is this project, which was posted on the FDB forums but as far as I can tell they never actually ported it to FDB before abandoning it. It’s not clear to me from the README what type of index they used, but the same org is a maintainer of a very popular SIMD vector search implementation (usearch) so it’s possible they were taking the approach I considered above.

SPANN is interesting, I haven’t read the paper.

The main reason I’m tempted by the quantize+brute-force approach is that it’s enticingly simple, and I am (naturally, as a programmer) very lazy :slight_smile: The complexity of HNSW and co make me very uncomfortable, especially since it doesn’t really seem like they perform that well. Contrast with a BTree, for instance, which is much easier to understand and yet has excellent performance (for a different use-case, of course).

It could very well turn out that vector search is a problem best solved by brute-force in parallel (like most of OLAP). I don’t think anyone knows for sure yet.

@a3kov A quick update on your question.

erlfdb: The erlfdb 0.2.2 release is in a real production app, working well, with a significant workload. I will consider a 1.0 release, if that is meaningful to anyone.

ecto_foundationdb: EctoFDB is still experimental, but it is now driving a small and low-stakes production app at https://livesecret.link (GitHub)

Cheers!

2 Likes

The inability to rename a field without rewriting the whole DB is a bit worrying… I’m not sure how much of an issue it may become, but it’s exactly the type of things that may block wider adoption of ecto_fdb.

That’s a fair take. However, I believe it glosses over the difficulty in safely renaming a column in other dbs as well, as detailed in safe-ecto-migrations | Renaming a Column.

Certainly the ecto_fdb project has a long way to go – deleting a field is table stakes, for example, and it’s not there yet. Thanks for your interest!

1 Like

From the linked article:

There is a shortcut: Don’t rename the database column, and instead rename the schema’s field name and configure it to point to the database column.

Ecto_fdb could do the same trick I believe ? But yeah, I totally forgot about this. In 99% of cases we care about the model (schema) field names, and not about the DB column names.

1 Like

Keep in mind the context in which I brought this up: FDB is designed to operate as a large, distributed system with a potentially enormous amount of data. The “database” layer (here, ecto_fdb) is designed to be stateless.

In this context, you have to take special care to use indirection for schema changes without breaking the consistency guarantees of the database and without rewriting the entire table (because, again, the table could be 100TB).

This is not easy, but it’s not impossible or anything. I don’t think it’s substantially more difficult than, for example, building indexes online (which was already implemented).

This “trick” is essentially doing the indirection in the application layer. It’s inelegant because, ideally, the schema should be stored along with the data it represents. But it would work, I suppose.

1 Like

I actually believe the FDB design decisions work quite well for small and moderately sized databases as well. For me, it’s more about the kind of workload and the operations story. I wouldn’t want people to come away thinking that they need 100TB of data in order to use it – LiveSecret is essentially a 0-Byte database running on a couple GB of memory (please don’t try to DOS it though :pray: :laughing:). I’m pretending FDB is an embedded database by using ex_fdbmonitor , instead of SQLite, even though I am a big fan. Of course, please check FDB’s system requirements for a production app. Running with low system memory does go against the grain of a recommended FDB deployment. (8GB per fdbserver process recommended)

To date, the largest value add for me is never having to write DDL and never having to worry about migrations while I’m doing additive changes (new schemas, new fields, new indexes), which is 99% of development anyway IME.

We wouldn’t know until we tried it, to be honest. Some Ecto features come for free, and some require the adapter to meet specific contractual details. But if it doesn’t work, I imagine it would be possible to add support. Let me know if you find out!

Happy to report there is progress on the main branch, not yet in a release. Specifically, there is a new CLI module, a guide, and a test for deleting or renaming a field safely, in the spirit of Safe Ecto Migrations. Renaming a field in this way actually requires 2 full table rewrites, which you may find unfortunate, but at least now it’s possible to do it :smile: .

Other progress on the main branch includes refactoring the IndexInventory into a general purpose metadata store. This paves the way for richer metadata such as a schema field map (like an atom table) that could allow for table operations without rewrites, similar to what’s been described in this thread. That may come in future work – I’m still considering the implications.

3 Likes

100% agree. Even for a smaller database you get high availability/durability out of the box and very strong correctness and consistency guarantees (much better than Postgres, for example). Plus if you ever do need to scale, you know it won’t be a problem.

What I was getting at is that any solution for schema operations needs to be able to scale to 100TB+ scale in order to get full value out of FDB. Rewriting a table is fine for a (very) small DB, but not so much as you grow. Especially since FDB encourages a multitenant approach, and that complicates things further.

I have never actually operated FDB, but I struggle to think of anything in the architecture which would necessitate 8GB per process for a small DB. I would imagine this requirement is for operating at scale.

If the DB is small then everything is small: shard map/txnstatestore, byte sample, serverdbinfo, mutations in memory on the tlogs and in the storage ptree, etc will all be small.

Another thing to keep in mind is that for a large DB rewriting a table, even in the background, is actually physically expensive. FDB is a huge distributed btree, and btrees have a lot of write amplification. Rewriting the rows in order will reduce that amplification considerably (because FDB naturally batches disk updates due to the in-memory MVCC design) but even then rewriting every key is naturally going to wear out your SSDs, which is not free. On the cloud they will charge you iops instead, so same problem.

After some thought I think this design would work:

  • Store a schema somewhere in the keyspace, perhaps in its own tenant
  • Store the ecto migration “version” under a key in the schema (transactionally) - I assume an Ecto adapter has access to this value?
  • Store a list of fields in the schema, and give each one a unique (monotonic would be fine) integer id

To create a field: add a new field key with a new (monotonic) integer id. It is easy to allocate ids transactionally because migrations are rare so there is no contention. You can store the next_id in the schema.

To rename a field: simply point the new name to the id previously used by the old name, and then delete the old name.

To delete a field: well, delete the field! The id will remain “allocated” because they are monotonic.

So, to run a migration, we perform the above operations and then update the “schema version” to whatever Ecto provides, similar to how it works with Postgres. There should be no problem submitting this as a single transaction to FDB, as it won’t be large.

Naturally, you can then store rows with integer keys instead of string keys, where the integers are the above ids. You could even encode them as varints for even more space savings since there are only as many ids as fields have existed (i.e. not many).

Finally, for correctness you would have to read the “schema version” in every transaction to ensure it matches what the client expects. This would create a bottleneck for a large DB, but you could use the \xFF/metadataVersion key for cache invalidation (increment it on every migration). That key was specifically designed to do exactly this.

You can cache the %{field => id} and vice-versa mappings in an ets table (or persistent_term, perhaps, would be better) along with the Ecto version and the metadataVersion.

1 Like

The FDB knobs are tuned with the 8GB per process assumption. Changing these knobs is generally reserved for advanced deployments. ref

Without key clears or inserts I don’t believe the btrees will need to rebalance in any significant manner. Unless I am missing something?

Yes, this is very much in line with my thinking, taking inspiration from Erlang’s atom table, but making it a permanent part of the metadata in the DB. However, there is likely a challenge with coordinating changes to Ecto Schemas across a distributed app. As you deploy a new app release, some clients will be on one Ecto.Schema and others on the new Ecto.Schema. It is during these transition periods that I believe something more must be done to prevent wires being crossed. For example, a naive approach would cause a node on the old Ecto.Schema to start getting nils for a field that has been remapped in the symbol table. This is probably all surmountable, but it’s a significant amount of work.

EctoFDB already does what you describe in each transaction. I’m not familiar with \xFF/metadataVersion; I will have to look into it, thanks for the tip!

1 Like

Do they actually statically allocate all that cache memory at startup? If so that’s a shame if it’s not configurable. There is nothing in FDB that would consume so much RAM for a small DB. Those knobs are probably pretty safe, though, I would think.

Tigerbeetle for example allocates all of their memory at startup, which is really cool. But it has to be configurable.

When you rewrite a key you are still writing the page to disk. There would be no splits or joins, but you are still wearing out your SSD with the writes. They can only take so many.

Database b+trees are not balanced like a binary tree, they are very wide and they just split the root instead. There is a lot of write amplification if you write a single key in a page because you have to rewrite the whole page. 100b k/v with 4k page size is 40x write amplification, for example. For a cow tree you then have to rewrite each parent page too, so if it’s 3 levels deep that’s 120x. Redwood uses a really weird versioned page table because they were trying to do MVCC at the storage level (but I get the feeling they gave up), so it’s not so simple. Also obviously batching saves a lot here.

Btrees do not like random inserts because they fragment the tree. If a node splits at >100% and joins at <50% then over time you would expect the average node to be 75% full, so there is 25% wasted space (“slack”). To reduce the slack you can rewrite the btree with sequential insertions. FDB data movement requests keys, in order, and writes them back into the “destination” btree, which conveniently compacts it. Users were abusing this property to keep their btrees in good shape by constantly excluding and re-adding nodes one at a time to compact the btrees. So they added that as a feature (“perpetual storage wiggle”).

Which brings me to an amusing point: since the storage wiggle rebuilds the btrees from scratch every so often anyway, Redwood does not merge nodes at all!

1 Like

The database schema (which comes from migrations) is stored statefully and transactionally in the database. FDB’s consistency guarantees will ensure the data is never corrupted (assuming my design has no mistakes!), so you don’t need to worry about that.

The Ecto schemas of course would, ideally, match up with the database state - but Ecto is specifically designed so that this is the user’s problem. Your guarantees here are the same guarantees that, say, Postgres, provides: there is a schema where “these fields” have “these names”. The user is responsible for ensuring their Ecto schemas are compatible with that. If not you throw an error! (or Ecto does).

There are situations where a user might have to break out those “safe Ecto migration” strategies to handle their own nodes’ inconsistencies. But you don’t have to worry about that - your job is just to ensure migrations execute quickly without rewriting the table!

BTW, my comment had grown long but there are a number of enhancements you can make to the schema:

You can store a type along with each “field” and raise on a mismatch like Postgres would.

You can store a “default” value with each field so that new columns can default to something other than nil. If the field is missing from a row use the default. Note that you have to be careful never to update the default value once the field is created or you would corrupt old rows.

You could even do foreign keys, which would be pretty cool.