What would you like to see from a native Elixir database?

The Elixir ecosystem is one of our biggest strengths, and the BEAM really lends itself to native implementations (e.g. Cachex over Redis, Bandit over Apache). But one thing which I feel is sorely missing is a good database. There are of course some options, perhaps most notably mnesia, but pretty much everyone here is using Ecto/Postgres (myself included).

What sort of features, functionality, and so on would you like to see from an Elixir database? I’ll give my personal list to start, but I’d like to hear some other perspectives as well.

Here’s what I’d like to see:

  • Embedded. The database should run entirely within the BEAM. In any other ecosystem “embedded+distributed” is an oxymoron, but not here!

  • High availability. Fault-tolerance is a core tenet of Erlang/Elixir, so an Elixir database should naturally follow suit. Losing some number of nodes should not cause meaningful downtime.

  • Replication. Similarly, data should be replicated such that losing some number of nodes does not cause data loss or corruption.

  • Horizontal scalability. Scaling out across cores and nodes is another core tenet which should be followed. Doing this properly would require autosharding as well.

  • Multitenancy. Native multitenancy is critical for driving down operational costs. A single cluster should be able to serve many isolated tenants.

  • Relational. The relational model is one of the great inventions of computer science and deserves respect. By relational I do not mean SQL, which has been a corruption of the relational model essentially since inception.

  • OLTP. An OLTP database would cover the vast majority of use cases. Also, it is fairly easy to build an OLAP database on top of an OLTP database, but going the other way is essentially impossible.

  • Transactional. Transactions should be atomic and isolated across nodes. I don’t even think I need to justify this tbh.

  • Interactive transactions. Deterministic databases are just too hard to use. Interactive transactions can scale if the database is designed properly.

  • Strong consistency. It is trivial to weaken strong consistency and nearly impossible to strengthen weak consistency. Therefore strong consistency must be offered by default. I would not accept anything less than strict serializability.

  • SSD storage. In-memory databases were all the rage in the late 2000s, but then SSD prices cratered way faster than DRAM. Essentially all modern databases run on SSDs. (RIP Optane)

  • Free. A huge number of “open source” databases have been rugpulled recently. Nobody is going to trust database startups for a generation IMO. Anything closed or proprietary is DOA.

As you can see, nothing ever written in Erlang or Elixir checks off more than a couple of these.

11 Likes

Ideally FoundationDB + SQL compatibility. Sorry for low-effort response but lately I’ve been fangirling over FoundationDB way too much.

6 Likes

I’ll turn this question on its head: what sort of features, functionality, and so on are you envisioning you can ONLY get with an Elixir database?

6 Likes

As you should!

Indeed, FDB is essentially the perfect architectural donor. It checks off almost everything on my list (though I do have a few more things I want which FDB doesn’t support). But of course FDB cannot be embedded in Elixir!

As for SQL, I think it’s a mistake. One very important thing I left off the list is incrementalization, i.e. live query support. SQL is made up of several decades of bloat and functionality which is probably useful for business processing but not useful for application building (my primary interest). Incrementalizing all of that functionality is essentially impossible (people keep trying, though).

Honestly I’d rather just bite the bullet and throw it away. It’s not like SQL was ever even good to begin with, it survives because it’s the standard (and yet every implementation is nonstandard, funny how that works!).

3 Likes

I don’t think there is any functionality which you could only get with an Elixir database. Even a database written in C could technically satisfy the “embedded” criterion if you were ambitious enough.

I will, though, gladly accept responses that reinterpret the question as “what features, functionality, and so on would you like to see from a new database which is by total and complete coincidence written in Elixir” :slight_smile:

What is the issue with SQL, it’s a language and it’s implementation dependent, so you could use it for anything that works on sets which is most data, relational or not.

There are so many SQLish languages out there, datadog, grafana etc that would have benefited from being a subset of SQL with implementation specific operators.

5 Likes

Really opening pandora’s box here, you could write a book on what’s wrong with SQL (and I bet there are several). Instead of derailing the thread into oblivion (I’m fully aware it’s my own doing) I’ll instead link this article which I enjoyed and need to read again. But there are so many more reasons.

Which is kind-of the point I was making. SQL’s best selling point is that it’s a standard (it’s bad at pretty much everything else) except it’s also not even good at that. You can’t even trivially port an app between MySQL and Postgres!

I am fully aware you are the author of the SQL library btw, which I think is really cool! But I will note that by adding composability to SQL you have, in fact, also managed to create something which is not SQL!

2 Likes

I don’t see how composability breaks anything, it’s still SQL, although with an advance engine that helps you write SQL that fits you.

To me, when people dis SQL, then it comes from a misunderstanding of what it is, regardless of the standard, most database are non standard, and there lies my point. SQL is whatever the query planner decide. Because the standard is implementation specific, which is, for me, a superpower.

Changing database is not a solid argument, because you rarely do it, and if you need to, then you can afford it too.

3 Likes

I am not trying to suggest that it “breaks” anything. I think your library is really cool, I’ve never seen anything quite like it before. It’s like a hybrid of an ORM DSL and actual SQL syntax, which makes up for one of the biggest shortcomings of SQL (hard to compose).

But the reason you have to make up for shortcomings of SQL with DSLs (or very clever macro strings that look like SQL but are not, actually) is that SQL itself is, uh, bad.

Well no, the execution of the query plan is what the query planner decides. SQL is a language. It does actually exist, like, as a language. There were better languages before it (QUEL), but it won because, you know, IBM.

1 Like

I’m sure it comes as no surprise that I agree with your list. :grinning_face_with_smiling_eyes:

I would be eager to use, and contribute to, a BEAM native DB with a low level API compatible with FDB’s core feature set (ordered keyset, get range, clear range, transactions, mvcc). A high level relational API is where many people would live, but the direct storage engine access is so powerful.

The FDB team never got the chance to implement all the Data Layers on top of their storage engine because of the acquisition. It would be exciting to have an ecosystem where those ideas could carry on.

Here’s a few things I would add to a wishlist

  • Stellar documentation
  • Public roadmap, including list of what features are considered “experimental” and whether or not there is an expected path/timeline toward production safety
  • No query planner, or a very limited one. Permissive planners have caused me a lot of headaches.
2 Likes

Exactly , it’s a language, but how the query planner decides to interpret and execute that, is sole on the query planner. And that is exactly why you would not need any other languages regardless of your data is
relational or not.

To take it a step further, there is nothing stopping us from querying mnesia, ets or dets with SQL. In fact this has already been done with Ecto.Query.

1 Like

It’s so funny reading a post from someone who actually knows FDB well lol. You know exactly where the pain points are. At the end of the day Apple clearly has no interest in maintaining it as an open project and the codebase is just way too onerous to fork. But full credit to them for releasing it at all, they didn’t have to.

Indeed, Elixir has some of the best docs out there so an Elixir database should continue that trend.

Unfortunately distributed databases with good architecture docs are quite rare (I have read like all of them at this point). CockroachDB has an okay design doc and some decent RFCs if you go looking. The best I’ve seen are probably Tigerbeetle’s, there is a lot to learn from their architecture WRT safety. FDB’s docs are a fragmented disaster (but still certainly not the worst).

One really gets the impression that vendors see architecture docs as a moat and therefore have little incentive to maintain them. One of the most egregious I saw was Neon (disaggregated Postgres), which boasted about being “open source” but had no documentation at all that I could find for how to self-host it. And now they’ve been acquired, so do you think those docs will ever be written? Yeah right.

1 Like

The API is very good but they made a couple of mistakes. A big one is that they defined a standard encoding for keys (tuple layer) but not for values. Binary keys/values are a bad abstraction, which they clearly recognized in the key case but not the value case. As a result Apple’s own record layer uses Protobufs.

There are two problems with this. First it hurts interop between potential layers, which is unfortunate. But second it makes it very hard to do predicate pushdown, which is very important for a distributed database. You can’t push a filter down to the storage layer if the storage layer has no idea what it stores!

And the tuple encodings are not suitable for values because they have to be parsed in full to be decoded. Which is necessary for sorting and fine for keys (which are small), but would be very bad for row values as you have to decode the entire thing just to read one column out of the middle. What you really want is a header with the value offsets up-front.

The solution, I think, is to define a record encoding to go along with the key encodings. I think it should (logically) be an ordered map integer => value where values are typed. Then you define a header up front with a format like type, id, offset, length?. Similar to protobuf etc.

Integer field ids are sufficient, schemas and so on can be managed by layers, either hard-coded or dynamic (you’re familiar with this!) schema management depending on the use-case.

I know where you’re coming from here, but the relational model necessitates some amount of query planning to satisfy data independence. We have discussed this before, though at the time I don’t think I realized this was spelled out so explicitly in the original Codd paper.

I think the planner can be kept fairly simple, though. I would argue even your ecto_fdb index selection logic constitutes a degenerate query planner (though it does not achieve independence).

Even more important is ensuring some sort of escape hatch. I don’t mind the Postgres planner 90% of the time, but then every so often it just refuses to use the indexes that I know are better and that just drives me insane.

After some thought I’m going to add a few more points which I had elided from the OP. These are a little more specific.

  • Sum types. Sometimes it can be useful to define fields (columns) which can be one of several types. This is something which needs to be used sparingly but which would be very useful in certain situations.

  • Polymorphic relations. This comes up a lot on here with Ecto (there is literally a new thread today lol). SQL makes this really hard. But if you implement foreign relations as types then you get polymorphism from sum types. This could be a nice DX improvement.

  • Incrementalization (live queries). Realtime functionality is a selling point for Elixir (particularly LiveView). There are ways to hack this together with PubSub or WAL parsing (with Postgres), but by designing this into the database from the start you can ensure a consistent realtime snapshot across queries. This would, essentially, be the “holy grail”.

I think the third is obviously useful but I would be interested to hear dissent on the first two as I’m less confident there.

Also, I have moved this thread out of the ecosystem category as I don’t see any reason for it to be hidden.

Strong consistency
Partition tolerance
High availability

These three are incompatible. Pick two

Free

Yeah, right, you want to have a database program with first-in-class features and you want to have it for free. Well, I mean, you can want that, sure, but no matter how much you want that, it sounds like a several years of work for like 5-ish top level engineers with very broad experience.

1 Like

I am well aware of the cap theorem lol. Was this a non-sequitur?

Free as in freedom.

I meant that one more as a success criterion than a “want”. Proprietary databases are essentially worthless to me, and VC funded “open source” databases have a habit of lying to their users/customers and turning into proprietary databases as of late. I think in order to be successful the developers of such a database will need to strongly signal their genuine belief in free software instead of using it as a carrot to trap customers.

I guess I should have been more clear.

Well you know what they say, mo engineers mo problems.

2 Likes

a writeup for Any way to embed riak_kv? · OpenRiak · Discussion #23 · GitHub would be enough for me

2 Likes

I didn’t realize Riak couldn’t be embedded. Seems like wasted potential; it would be pretty cool to have in-BEAM even just for unit tests/development.

I guess back in the day they were targeting the wider market so they never bothered?

1 Like

Most importantly, stability/reliability and performance/speed.

I experienced no end of problems with MySQL, and, touch wood, to date, zero problems with Postgres.

+1

I think irrespective of which language a new DB was built on there really would need to be a compelling case to get people to switch from something like Postgres - what usp could an Elixir DB have?

3 Likes

The ability to “rewind” and see the state in the past.

This is what Datomic allows.

From the homepage:

Critical insights come from knowing the full story of your data, not just the most recent state. Datomic stores a record of immutable facts, which gives your applications strong consistency combined with horizontal read scalability , plus built-in caching . Since facts are never update-in-place and all data is retained by default, you get built-in auditing and the ability to query history . All of this with fully ACID-compliant transactions .

And their docs:

  • Auditability: the total ordering of transactions and the fact that nothing is ever changed or removed means that you know and can demonstrate every step in the history of your data. The fact that transactions are themselves entities also offers an ideal place to store domain-specific provenance metadata.
2 Likes