Thanks for the feedback, I love to have it. You raise some good points. I apologize if my response is long.
Non-indexed filtering, and other query features
The benefit of us only supporting indexed :where
clauses is that the developer knows exactly what they’re getting when they write the query. Of course we have no explain/analyze here, so I believe the role of the adapter must remain very clear to the end user. That role is: For each call to Repo.all
, EctoFDB will always execute a single get*
operation against FoundationDB (disregarding the IndexInventory), and do minimal processing locally. This way, the developer knows that the fact that a query works at all is confirmation that they’re getting expected performance characteristics, with respect to their chosen indexes.
Indeed, a downside of this is that when a new index is created, the developer must rewrite their queries to take advantage of the index. There is a risk that they miss a query and their app fails to leverage the index. I do consider this an ok tradeoff. The target audience is people that want to get “closer” to their data so-to-speak.
Moreover, suppose the adapter did support open ended where clauses. EctoFDB would then need to decide which index to use to conduct the query, which puts us on path to implementing a real query planner, which I’m not prepared to bite off at this point.
This line of thinking probably seems backwards to conventional wisdom around database querying. Most people usually want their query to do as much work as possible. For a database like Postgres, this is natural because the query computation is done within the database itself, where the sophisticated query planner can make many optimizations to carry out sorting, joining, filtering, column selection, etc, which reduces the total data that is transmitted on the network. In our case, the compute is detached from the storage, a consequence of the FDB Layer concept. This implies that you must pull a relatively large amount of data into your client and operate on it there. (And encourages your client to be as “close to” the FDB server as possible) The current design limitations of this adapter on Ecto.Query
reflect these ideas back to the developer, so hopefully there is little doubt about the behavior.
One last point tangential to this topic – EctoFDB’s use of tenants is already buying 1 level of space partitioning on the data. While some relational approaches may use a foreign key for multitenancy, EctoFDB arranges the keys such that all data for a tenant is partitioned in space from others, meaning there is already an index built in, in a way.
Value encoding using term_to_binary
This is a design decision that I struggle with to be honest. There are pros and cons. BTW you’re right about the terms themselves being Keyword lists.
Things I like about term_to_binary
on Keyword list:
- Fast and simple
- Fairly easy to inspect and debug FDB key-value pairs outside of Ecto
- Adding a new field is trivial. IME adding new fields is the most common schema change.
- All Erlang terms are supported naturally
Things I don’t like:
- Wasteful in space – all field names are stored in each value. We may someday make use of the
:compress
option, but the gains will be limited due to the unique field names.
- Renaming fields does require a data migration, as you pointed out. In fact, EctoFDB doesn’t yet support renaming fields at all, a major gap at the moment.
- Some Erlang terms should not be stored permanently, and EctoFDB doesn’t provide any assistance. For example, storing pids would work, but can be risky.
You’ve noted some good benefits to using Protobuf. Here are some drawbacks as I see it.
- A headache to manage. This is more of a personal opinion on Protobuf.
- Unclear to what extent the Ecto.Schema could be tied to a Protobuf definition. If you’re aware of any work in this space, I’m interested in hearing about it.
That being said, it does seem reasonable for EctoFDB to support both term_to_binary and Protobuf (or something similar) someday, perhaps there are even use cases where it’s a choice that can be made at the Ecto.Schema
level. However I’m not at a spot where I would implement this now. There is significant complexity, and the term_to_binary
approach has not failed me yet. If you have a use case for Protobuf perhaps we can collaborate on some ideas in a GitHub Issue.