Once - Ecto type for globally unique 64-bits IDs generated by multiple Elixir nodes

jsm · January 8, 2025, 12:26am

Once is an Ecto type for locally unique 64-bits IDs generated by multiple Elixir nodes. Locally unique IDs make it easier to keep things separated, simplify caching and simplify inserting related things (because you don’t have to wait for the database to return the ID). A Once can be generated in multiple ways:

counter (default): really fast to generate, predictable, works well with b-tree indexes
encrypted: unique and unpredictable, like a UUIDv4
sortable: time-sortable like a Snowflake ID

A Once can look however you want, and can be stored in multiple ways as well. By default, in Elixir it’s a url64-encoded 11-char string, and in the database it’s a signed bigint. By using the :ex_format and :db_format options, you can choose both the Elixir and storage format out of t:format/0. You can pick any combination and use to_format/2 to transform them as you wish!

Because a Once fits into an SQL bigint, they use little space and keep indexes small and fast. Because of their structure they have counter-like data locality, which helps your indexes perform well, unlike UUIDv4s. If you don’t care about that and want unpredictable IDs, you can use encrypted IDs that seem random and are still unique.

The actual values are generated by NoNoncense, which performs incredibly well, hitting rates of tens of millions of nonces per second, and it also helps you to safeguard the uniqueness guarantees.

The library has only Ecto and its sibling NoNoncense as dependencies.

Package: once | Hex
Source: GitHub - juulSme/Once: An Ecto type for locally unique 64-bits IDs generated by multiple Elixir nodes
Docs: Once v0.0.4 — Documentation

sbuttgereit · January 8, 2025, 12:46am

This looks like and interesting project and I think it’s good to have a library producing 64bit keys rather than UUID’s 128.

Historically this has been true, but recently the UUID specification has been updated with new UUID versions (specifically v7 being relevant here) to address RDBMS use of UUID as record IDs. I would suggest reading: RFC 9562: Universally Unique IDentifiers (UUIDs), which gets into the details of their thinking on this problem and their approach to solving it in the framework of the UUID standard.

The PostgreSQL project has recently committed its implementation of UUIDv7 (PostgreSQL: Re: UUID v7) which should be in the next release of PostgreSQL (18). True, it’s not out yet, but there are extensions which have been providing this capability in the meantime. For other database vendors… well I don’t follow them so much right now so can’t comment on their status.

From this perspective, it seems to me that your larger selling point would be the smaller bit length, which can absolutely be a plus, rather than contrasting against the historic issues with UUID use in relational databases.

jsm · January 8, 2025, 8:41am

Thanks for the feedback and for taking an interest Maybe I should say “compared to random UUIDs”. The bit size is the bigger (irony trophy!) thing indeed, and generating them fast enough to be able to use them for everything in an application.

Also, after reading the RFC for a bit, I’m thinking that maybe I should adjust the wording to “locally unique” instead of globally. I went for globally in the context of Elixir with its “global” registry etc but that does not seem to be entirely accurate in the context of the RFC

garrison · January 8, 2025, 9:39pm

I just want to point out for anyone who’s curious that this isn’t quite so simple. There are two reasons I’m aware of that indexes perform better with “locality preserving” keys:

One has to do with the postgres visibility map (which is what that article is about), which is of course entirely postgres-specific. There is no reason it has to be this way, postgres just has an ancient and frankly terrible backend, and they probably designed it to work that way because back in the 80s or 90s they would have never thought to take the performance characteristics of random primary keys into account because nobody was doing that.

The second issue is a little more insidious: btrees benefit from sequential insertions with locality because those insertions hit the same pages in the tree, meaning the pages are far more likely to still be in cache when the next insertion comes. Similarly, if you batch up the insertions and group commit them you only need to write each page once for many insertions (database btrees are very wide, remember) so that’s a huge performance gain.

However, it is very important to remember that this only applies to btrees. If you had an LSMtree storage engine, for example, then the above is not relevant at all. (Philosophically speaking, the fact that LSMs don’t exhibit this problem is essentially why they exist at all, but I digress).

For example, CockroachDB (which, remember, is a distributed DB) specifically recommends using UUIDv4 primary keys by default, because they want to spread out inserts across their range-partitioned servers, and because their storage engine is RocksDB (ish), an LSM.

Anyway, this seems like a cool library! And the performance improvements from having shorter keys do exist too, though they are not quite so drastic for “normal” use.

jsm · January 8, 2025, 10:14pm

Thanks for the insight and taking an interest! I suppose we may be done with the past, but the past ain’t done with us out of curiosity, why do you think the PG backend is awful? I know nothing about it, only that from personal experience, I’m always glad to encounter Pg and mildly apprehensive when I come across MySQL. Or is that even worse?

garrison · January 9, 2025, 6:27pm

Here is a good article on the topic, written by someone who definitely knows what he’s talking about:

But what it boils down to, really, is that Postgres is just very old - much older than other open source RDBMSs. MySQL is from the mid 90s, and its original storage engine was complete garbage (it didn’t even have transactions!), but they replaced it with InnoDB in 2009. Likewise with MongoDB, which was well known for incinerating user data, but was popular enough that they had the (VC) money to go and buy WiredTiger, which is very good.

I’m sure the decisions made around Postgres’s storage engine didn’t seem so bad at the time - it’s just that “the time” was literally the 80s and computers have changed a lot since then. Superscalar, multi-core, vector instructions, SSDs… you get the idea. Postgres doesn’t even use threads, it uses a process-per-connection model which famously prevents it from maintaining large numbers of connections, hence pgbouncer etc.

https://lwn.net/Articles/934940/

There are a couple ongoing attempts to fix this by rewriting the backend since Postgres is, mercifully, quite modular in its design. OrioleDB is one, now being funded by Supabase I believe. Neon postgres is another which is quite focused on being a cloud offering but is open source. Neon in particular is essentially an open source version of Aurora Postgres (designed to replicate across disks in case of failures).

To be clear, Postgres is a great database and I’m not suggesting you shouldn’t use it. At the end of the day the permissive license and long, stable history of open source are more than enough to make up for its other shortcomings. See for example CockroachDB which recently rugpulled its open source guarantees entirely.

jsm · January 13, 2025, 6:07pm

v0.0.4 is out!

add support for sortable nonces (Snowflake IDs)
add :type option
soft deprecate :encrypt? option

AlanMcCann · January 13, 2025, 6:45pm

Could this be incorporated with a prefix to end up with stripe like ids (e.g. acct_123123123123)?

jsm · January 13, 2025, 10:07pm

Hi, thanks for taking an interest that’s not immediately possible but I also don’t think that there are large obstacles to implementing such a thing. The main problem is that the current code relies on the length of input binaries to convert them from one format to another and assumes the basic value to be exactly 64 bits. I think it would not be too hard to add prefix support, although I don’t have time to do it right now.

However, if you don’t need the format conversion stuff, then it would be really easy to create your own Ecto type on top of NoNoncense that does implements this right now In the end Once relies on NoNoncense too. In the meantime I’ve added your suggestion as a feature request!

mpope · January 14, 2025, 12:16am

Very cool, reminds me of GitHub - boundary/flake: A decentralized, k-ordered id generation service in Erlang!