Okay, so for an “analytics database” it’s a little more complicated.
First of all, you want to store analytics data in a columnar format. Hobbes is a pure binary/binary KV store underneath, so you can technically store data in whatever format you want. There have been attempts to store columnar data directly in FoundationDB, though I’m not sure if there are any production examples.
However, there are a few things to note:
- The access patterns for OLAP are generally to write data once, sequentially (updates are rare), and then perform large aggregations over the dataset, which kinda sounds an awful lot like the blob case if you think about it
- You have to design a columnar format for the data, which is a lot of work
- You have to write an advanced (probably) SQL analytics query engine, which is on the order of a few million lines of tightly optimized low-level code (certainly not Elixir)
So due to number 1, you might want to structure your analytics database like a blob store. That is, there is a metadata store tracking which blobs of columnar analytics data are “live” and which ones have been superseded or removed. Snowflake built a hundred-billion-dollar business out of doing this on top of FDB, storing data in (I believe) S3 and specializing in the query engine stuff.
Due to market forces, a number of standards for columnar storage developed. The most notable of these is probably Parquet. The idea here is that a customer can keep their data in Parquet files and then move their “data warehouse” to another provider, which I’m sure companies like Snowflake just loved.
More recently, standards for the query engines themselves have developed. Embedded query engines like Apache DataFusion started to pop up. An emerging winner here is DuckDB, which has distinguished itself with cute branding and a user-friendly standalone CLI. As it turns out, those things are actually very important. You have probably never even heard of DataFusion, have you?
And now, standards for how to maintain the metadata store mapping have also arrived. There are a lot of marketing buzzwords here (try “data lake” and “lakehouse”), but it’s just more standardization at play underneath. DuckDB now actually has its own standard “lakehouse thing” with DuckLake, which indeed uses a metadata store (they call it a “catalog store”) like Postgres to keep track of a bunch of Parquet files.
So, if you were so inclined, you could build an analytics database in Elixir, using Hobbes, like this:
- Write parquet files to some sort of blob store (perhaps also built with Hobbes)
- Keep track of which files are currently “live” using Hobbes as an index
- Use DuckDB to perform fast analytical queries on the parquet files
And, again, you would get strong correctness, durability, and availability guarantees while solving zero hard distributed systems problems.
That’s the idea.