Once is an Ecto type for locally unique (unique within your domain or application) 64-bit IDs generated by multiple Elixir nodes. Locally unique IDs make it easier to keep things separated, simplify caching and simplify inserting related things (because you don’t have to wait for the database to return the ID).
Because a Once fits into an SQL bigint, they use little space and keep indexes small and fast. Because of their structure they have counter-like data locality, which helps your indexes perform well, unlike UUIDv4s.
Once IDs are base on counter, time-sortable or encrypted nonces. These underlying values can then be encoded in several formats, can be prefixed and can be masked. And you can combine all of these options, providing great flexibility.
The library has only Ecto and its sibling NoNoncense as dependencies. NoNoncense generates the actual values and performs incredibly well, hitting rates of tens of millions of nonces per second, and it also helps you to safeguard the uniqueness guarantees.
This looks like and interesting project and I think it’s good to have a library producing 64bit keys rather than UUID’s 128.
Historically this has been true, but recently the UUID specification has been updated with new UUID versions (specifically v7 being relevant here) to address RDBMS use of UUID as record IDs. I would suggest reading: RFC 9562: Universally Unique IDentifiers (UUIDs), which gets into the details of their thinking on this problem and their approach to solving it in the framework of the UUID standard.
The PostgreSQL project has recently committed its implementation of UUIDv7 (PostgreSQL: Re: UUID v7) which should be in the next release of PostgreSQL (18). True, it’s not out yet, but there are extensions which have been providing this capability in the meantime. For other database vendors… well I don’t follow them so much right now so can’t comment on their status.
From this perspective, it seems to me that your larger selling point would be the smaller bit length, which can absolutely be a plus, rather than contrasting against the historic issues with UUID use in relational databases.
Thanks for the feedback and for taking an interest Maybe I should say “compared to random UUIDs”. The bit size is the bigger (irony trophy!) thing indeed, and generating them fast enough to be able to use them for everything in an application.
Also, after reading the RFC for a bit, I’m thinking that maybe I should adjust the wording to “locally unique” instead of globally. I went for globally in the context of Elixir with its “global” registry etc but that does not seem to be entirely accurate in the context of the RFC
I just want to point out for anyone who’s curious that this isn’t quite so simple. There are two reasons I’m aware of that indexes perform better with “locality preserving” keys:
One has to do with the postgres visibility map (which is what that article is about), which is of course entirely postgres-specific. There is no reason it has to be this way, postgres just has an ancient and frankly terrible backend, and they probably designed it to work that way because back in the 80s or 90s they would have never thought to take the performance characteristics of random primary keys into account because nobody was doing that.
The second issue is a little more insidious: btrees benefit from sequential insertions with locality because those insertions hit the same pages in the tree, meaning the pages are far more likely to still be in cache when the next insertion comes. Similarly, if you batch up the insertions and group commit them you only need to write each page once for many insertions (database btrees are very wide, remember) so that’s a huge performance gain.
However, it is very important to remember that this only applies to btrees. If you had an LSMtree storage engine, for example, then the above is not relevant at all. (Philosophically speaking, the fact that LSMs don’t exhibit this problem is essentially why they exist at all, but I digress).
For example, CockroachDB (which, remember, is a distributed DB) specifically recommends using UUIDv4 primary keys by default, because they want to spread out inserts across their range-partitioned servers, and because their storage engine is RocksDB (ish), an LSM.
Anyway, this seems like a cool library! And the performance improvements from having shorter keys do exist too, though they are not quite so drastic for “normal” use.
Thanks for the insight and taking an interest! I suppose we may be done with the past, but the past ain’t done with us out of curiosity, why do you think the PG backend is awful? I know nothing about it, only that from personal experience, I’m always glad to encounter Pg and mildly apprehensive when I come across MySQL. Or is that even worse?
Here is a good article on the topic, written by someone who definitely knows what he’s talking about:
But what it boils down to, really, is that Postgres is just very old - much older than other open source RDBMSs. MySQL is from the mid 90s, and its original storage engine was complete garbage (it didn’t even have transactions!), but they replaced it with InnoDB in 2009. Likewise with MongoDB, which was well known for incinerating user data, but was popular enough that they had the (VC) money to go and buy WiredTiger, which is very good.
I’m sure the decisions made around Postgres’s storage engine didn’t seem so bad at the time - it’s just that “the time” was literally the 80s and computers have changed a lot since then. Superscalar, multi-core, vector instructions, SSDs… you get the idea. Postgres doesn’t even use threads, it uses a process-per-connection model which famously prevents it from maintaining large numbers of connections, hence pgbouncer etc.
There are a couple ongoing attempts to fix this by rewriting the backend since Postgres is, mercifully, quite modular in its design. OrioleDB is one, now being funded by Supabase I believe. Neon postgres is another which is quite focused on being a cloud offering but is open source. Neon in particular is essentially an open source version of Aurora Postgres (designed to replicate across disks in case of failures).
To be clear, Postgres is a great database and I’m not suggesting you shouldn’t use it. At the end of the day the permissive license and long, stable history of open source are more than enough to make up for its other shortcomings. See for example CockroachDB which recently rugpulled its open source guarantees entirely.
Hi, thanks for taking an interest that’s not immediately possible but I also don’t think that there are large obstacles to implementing such a thing. The main problem is that the current code relies on the length of input binaries to convert them from one format to another and assumes the basic value to be exactly 64 bits. I think it would not be too hard to add prefix support, although I don’t have time to do it right now.
However, if you don’t need the format conversion stuff, then it would be really easy to create your own Ecto type on top of NoNoncense that does implements this right now In the end Once relies on NoNoncense too. In the meantime I’ve added your suggestion as a feature request!
A word of caution for the UUIDV7s that recently bit me.
The are sortable by time of generation but, that time has a Millisecond resolution.
So, If you generate 3 UUIDV7s within the same millisecond (which is very easy to do), there is no guarantee they will be sorted in the same order you created them. I learned this the hard way
The UUIDv7 spec actually allows you to use the 12 rand_a bits as a sub-millisecond timestamp, or alternatively I think you’re allowed to use up to all 74 random bits as a monotonic counter if you really want to.
So if you want to you can absolutely order them while still following the standard.
BTW the bit about LSMs in this comment is actually wrong as I eventually found out.
LSMs can benfit from k-sortable ids in a number of ways. For one, when keys are inserted in strict monotonic order there is an optimization which allows the engine to avoid rewriting the SST. This is similar to the optimization you would use when building a Btree from perfectly sorted data.
Also I think inserts in “mostly”-sorted order would still reduce write amp in leveled LSMs.
The point being: there is more nuance to this than you might think, and I clearly did not deeply understand LSMs when I wrote that reply lol. The btree stuff is correct, though, and it is true that LSMs are “less” affected by random keys.
In the systems I work with, I try to keep “record identity” and things like business logic relevant dates/times, or sequencing generally, conceptually and structurally separate… even when there’s a high expectation that timing in record creation could also be used to establish business timings or that a business timing could be used as a record ID date component. Yes, I spend extra bytes per record and related processing when I do this, but I find that keeping these distinct concepts separate I have many fewer questions after the fact about things like business relevant/sequencing and the like.
While building business-meaningful sequencing into your record IDs is something you can get away with for a long time (maybe forever even) I’m a veteran of enough corporate merger related data migrations to know some of the tricks we play with surrogate key IDs just to get data sets merged without collisions… and it ain’t always pretty. Yes, UUID should avoid that sort of problem… but you never know what you’ll run into.
Rounding off a late-year cycle of releases that started with SpeckEx and NoNoncense 1.0, Once 1.0 is out! It adds the following:
Support for Stripe-like prefixed IDs using Once.Prefixed. The prefix is optionally persisted to the database.
Support for hex32-encoding of IDs using the new format :hex32.
Upgraded to NoNoncense 1.0 with its improved performance and cipher support for encrypted IDs. These changes are breaking and require careful reading of the migration guide.
The highlight is Masked IDs, which provide a middle ground between plaintext and encrypted IDs. IDs are stored as plaintext in the database but encrypted when retrieved by the Ecto type. These can be combined with other functionality like prefixes and time-sortability, and can be stored/rendered in all supported formats (signed, unsigned, hex, hex32, url64 and raw).
Benefits: Masked IDs have the database performance of plaintext IDs (written sequentially to the right of the index), but the application sees encrypted values so they are unpredictable and can’t be enumerated. ORDER BY id and keyset pagination work transparently (cast decrypts input). No database migration is needed for existing IDs.
Trade-offs: Database and application IDs look completely different (operational friction with SQL/BI tools), slight performance cost for encryption/decryption on every read/write, ordering info can leak into the application when using ORDER BY id queries.