TL;DR: CUID is designed for a specific kind of solution to problems you may have never had yet.
I know Iâm dropping in a bit after the fact, but I didnât see anyone directly address your questions about the weird vibes given off by the CUID spec. It turns out that the short attention spans and pervasive mediocrity of our field makes it reasonably likely that you wouldnât have been in a situation where you would need this.
The stilted ranting of the CUID spec doesnât help, for sure. They do a terrible job explaining what they mean. Then again, once youâve learned how to solve certain problems, it sometimes becomes very difficult to even discuss it with people who havenât already learned what you know.
Thatâs essentially whatâs going on with the CUIDv2 spec and its arguments. It turns out that all of their reasoning is sound, but wonât make any sense unless you are writing your application a certain way. And they can no longer conceive of building things any other way.
There are particular assumptions that you find in Phoenix and most MVC-esque frameworks. You typically find them in most HTTP APIs as well. Why that is the case involves⊠well⊠a lot of history.
Allow me, if you will, to take you on a strange journey. (Apologies. This will be long.)
A History Lesson
Note that I didnât say REST APIs. Thatâs important. What youâre probably used to doing is writing a REST-like (or RESTish?) APIs. Basically, JSON-over-HTTP with a few conventions about methods and maybe status codes. Thereâs nothing wrong with this. Itâs works just fine for 98% of projects.
It turns out that these âconventionsâ are not at all what REST was originally concerned with. It often surprises people to discover that REST as originally defined is not really the blueprint for making APIs we struggle with today. It was really more of the distillation of the wisdom that came out of designing HTTP itself.
HTTP was born trying to solve a problem that has sometimes been referred to as âanarchic scalabilityâ. Basically, how to build apps and APIs in a world where everything is moving fast and nobody can claim to have absolute authority to dictate how it works.
Out of this, he created this idea of âHypertext As The Engine Of Application Stateââusually referred to by the worldâs worst acronym: HATEOAS. REST was a set of principles that grew and blossomed as HTTP was first designed and implemented.
This version of REST has been largely lost in modern API discourse. Itâs actually pretty fascinating and well worth learning. You can read the original paper here.
Thatâs only half of the story, though. The gory details of how it got lost to history are a whole tale of their own. If you really care, you can read more about it here.
(You see this a lot, actually. Itâs pretty common. For example, Object Oriented Programming as conceived by Alan Kay is very different than what people have turned it into. Ironically, his version of OOP is closer to Elixir than C++. Though maybe thatâs not so surprising because Erlang had strong influence from Smalltalk. But, I digressâŠ)
If you dig into the original REST paper, youâre going to find out that it hardly even mentions most of the things you think of as REST.
Some âModernâ Assumptions
Coming back to the problem at hand, consider the following assumptions you probably have about how to design an HTTP API:
- IDs are generated on (and arbitrated by) the server (probably the database itself).
- Youâre probably going to have to strap some sort of idempotency keys onto your API to make it resilient in the face of transport failures.
- Retry and recovery logic may end up being fiendishly complex.
- Caching is only for browsers. APIs use
Cache-Control: no-cache liberally.
- Creation requests should only use the
POST method.
- Only update requests will use the
PUT method.
As it turns out, these assumptions (regardless of how common they are) create problems that donât need to exist. They are largely ignorant of the benefits or a more classically RESTful approach.
Commonly we end up using HTTP verbs in ways that not only make things ambiguous, they often end up using these methods in ways that clearly conflict with how theyâre defined to be used in the HTTP spec.
In doing so, we make applications less reliable, less predictable, more complex, and harder to scale. It doesnât have to be this way.
CUID v1 and v2 are designed to provide a minor but very important missing piece of an approach that entirely avoids creating any of these problems.
Feel My Pain
Anyone with much experience almost certainly has trauma here. Thereâs this story that repeats itself every new startup you work for. Every time you try to avoid it. But, like the Classical Greek story of Cassandra, you tragically can only predict the crisis, but never avert it.
It starts when you try to ask for a little forethought before designing an API. Thereâs a kind of âget it doneâ personality that insists that âDo it the ânormalâ way. Everything is fine. There are no decisions to make. Something something Best Practices. Everybody knows this is settled.â Their tragic lack of imagination and woeful inability for introspection thoroughly suppress any chance for innovation and will doom us all.
Theyâll immediately roll their eyes when you suggest âThis isnât really that RESTful, maybe we should take a second to actually define some terms. How about we actually design something for a minute?â They shriek âBig Up-Front Design BAD! Thatâs not agile!â
They insist that REST == CRUD and dismiss any attempt at deeper understanding as âbike-sheddingâ. Their assumptions are rapidly baked into the codebase. For a time, it seems like it will be fine. They crank out code at a prodigious rate. The KPIs look good.
Then⊠things break. Who knew the network isnât perfect? What do you mean it succeeded but we didnât get the response? Somehow the find the idea to naively implement retries lurking somewhere within their smooth brains.
Oops, now weâre sometimes doing creation or updates twice. Then theyâll generate massive amounts of âdefensive programmingâ code in clients. Theyâll generate this web of custom code for each request just so clients can figure out the ID of whatever they may or may not have created.
Invariably their confused and befuddled retry logic will start hammering other services into the ground retrying in tight loops. Here they begin excitedly rambling about circuit breakers and exponential backoff and rate limiting because they heard about it on YouTube or at the coding bootcamp they went to.
Theyâll e-mail the team blog posts about a million half-measures. Many of them donât even fit the problem.
Meanwhile, the SREs are becoming just as restless as your app is RESTless. Just ignore the fact that your dashboards are full of errors that arenât errors and the useful signals lost in ânormalâ bursts of errors-that-arenât-really-errors. Who needs actually meaningful error metrics?
Is this monstrosity generating tons of inconsistent or even incorrect data? We can manually clean up any junk that gets left behind in the melee. Whatâs a little toil between friends? The SREs now begin to exhibit the â500 yard stareâ in Zoom meetings. At least remote work means you donât have to listen uncomfortably to their anguished sobs (so long as they remember to stay muted).
Every dayâs standup starts with âOh, yeah, we can fix that. All youâve gotta do is⊠â At this point, you canât really make headway. Theyâve created a beast so complicated that nobody can understand itâleast of all them. Not that stops them from thinking that they understand it.
Consequently, nobody can make the case for how or why itâs busted. The system develops âbehaviorâ. They blithely coast along with the momentum of someone who has mastered being confidently wrong. Theyâve made into more than just a profession, itâs their way of life.
Invariably they start throwing around the word âidempotentâ. They double-down and insist on adding yet-another-key to the request. Maybe itâs field in the JSON body. Maybe itâs a header. Could be a query parameter! Regardless, it becomes just more metadata that requires infrastructure to develop, maintain, and scale.
You see, itâs just to prevent consistent updates from corrupting state. Clearly thatâs a different, new, and entirely separate problem. It canât be because weâre taken an oversimplified approach to building an API such that it canât even make consistent updates. Clearly, one more middleware is all it needs!
Now youâve got this giant mess where you need to track these new keys consistently; despite the fact that you already have a single ID that you canât track consistently. The client still has to read tea leaves to divine this elusive object ID for the resources it may or may not have created successfully.
The retries and recovery queries are absolutely hammering your services, now. A million requests, made by an idiot, full of sound and fury, signifying nothing.
Then they have an insight: âThe clients can get back the object ID that was in the response they missed if we save the original response and replay it. All youâve gotta do is save the response under the idempotency key somewhere.â
It turns out that replaying cached responses provides no guarantee that the response even corresponds to the same request. And how long do we save those responses? Nobody knows.
Maybe itâs an hour. That works fine until the fiber between regions goes down for an afternoon. Now all of the hung up requests start retrying. They end up changing things out-of-order multiple times because we didnât save the cached responses.
Or better yet, their code starts generating duplicate idempotency keys for unrelated requests (because they invented some half-baked scheme for generating the keys). Maybe weâre trusting the client too much here.
Or maybe theyâre accidentally sending subsequent requests with the same key, but different parameters, because they vibe coded a bit too deeply and greedily. Now what comes back depends on timing, ordering, and the phase of the moon.
Or maybe caching the whole response takes too much storage. Weâll just reconstruct the response. Oops! Some data was only available at creation time and itâs missing now. Surely the client wonât care if there are subtle differences between original responses and the ones that weâre now synthesizing.
Eventually Mr. Get-It-Done gets all defensive when concurrent updates start to drastically corrupt state. The same guy that complained about âknow-it-alls whining about the meaning of RESTâ start using terms like âphantom readsâ, âthe last update problemâ, and âserializabilityâ. Apparently itâs âpedanticâ when you tried to talk about it up-front; but itâs âstate-of-the-artâ when they do it.
Since theyâve unerringly missed the point for months, they then desperately start trying to layer on more half-measures and hardly considered âfixesâ. Letâs bundle multiple requests in a single POST. Or try distributed transactions (because they read a blog post about that a year ago). Maybe theyâll discover âsagasâ.
All the while insisting this unmaintainable mess was inevitable. Theyâre like some kind of software-engineering Thanos⊠they just snap their fingers and half of your velocity turns to dust.
Eventually they push to do an entire rewrite. Or maybe they want to move to NoSQL (since your relational database is spouting flames from all of the superfluous work theyâre piling onto it). Theyâll probably end up being promoted for building such a high-tech âsolutionâ. Itâs not clear if thatâs good because theyâre not working directly on the codebase anymore or bad because theyâre now a âtech leadâ or âarchitectâ.
Can we maybe learn anything from HTTP itself?
Letâs start with some highlights from RFC9110:
GET
- §9.2.1: The
GET method is âsafeâ.
- §9.2.2: They are guaranteed to be idempotent.
- §9.3.1: Responses are cacheable and update caches on the way through.
POST
- §9.2.1:
- The response
POST method is not safe (implied).
- §9.2.2:
POST is not guaranteed idempotent.
- Unless it is.
- But youâve got to know.
- Good luck.
- §9.3.3:
POST responses are not cacheable.
- Except when it is, but only if some extra metadata is returned.
- But not for
POST requests; because theyâre allowed to be âunsafeâ.
- But, sure, for
GET requests⊠if you want to.
- Oh, and it doesnât necessarily create resources, though it might.
- In fact, it can even create things without explicitly telling you if it wants.
- For that matter, nothing about a
POST guarantees that itâs going to create what you asked it to.
PUT
- §9.2.2:
PUT requests are guaranteed to be idempotent.
- §9.3.4:
- Responses are not cached and invalidate caches on the way through.
- If they create resources, they MUST tell you.
- If it returns successfully, you can be sure that the state has been updated to exactly what you requested.
DELETE
- §9.2.2:
DELETE requests are guaranteed to be idempotent.
- §9.3.5: Responses are not cachable and invalidate caches on the way through.
Notice a pattern there? POST is the wild west. It has no strong semantics. Itâs whatever it wants to be. Nobody in their right mind should use it. If you do, donât expect anything sane out of it
We can at least use GET, PUT, and DELETE sanely. Theyâre tightly defined. Thatâs great. But how do I map four operations to only three methods? And what about retries and double creation and concurrent updates and all of the other problems we ran into?
Be The Change You Want To See In The Database
If only there was a way we could make sure that a retried request (or an update) is operating on the state we expect to be there.
Going back to RFC9110, letâs learn about conditional requests:
- §13.1:
- There are âpreconditionsâ you can specify to ensure that your state doesnât get thrashed because other requests (retries, other clients, etc.) changed something out from under you.
- §13.1.2:
If-None-Match
- Want to ensure youâre creating something?
- Use
If-None-Match: * and your PUT wonât turn that create into an update.
- Suddenly.. you donât need
POST!
- §13.1.4:
If-Unmodified-Since
- What if youâre doing a read-update-write cycle?
- If there are dueling requests with other clients, your updates might step on each other!
- But, if you know the date returned when you did the read, you can say to only do the request if it hasnât changed!
- Now retries are safe without corruption due to âthe last update problemâ
- §13.1.1:
If-Match
- Time is kind of imprecise here. Maybe youâre nervous about using modification dates to prevent accidents.
- If only we could use a hash of the data or something?
- Thatâs what the
etag is for.
- Now you can use
If-Match: <etag> and youâll get consistent updates based on the actual content.
- This even works if retried
PUT requests update the timestamp twice without changing the data.
- §15.5.13:
412 Precondition Failed
- Thereâs a dedicated
4xx series error code just for this.
- No longer do I have to figure out if I should return a
400 Bad Request, a 401 Unauthorized, a 403 Forbidden, a 404 Not Found, a 410 Gone, or 418 Iâm A Teapot (if you support the HTCPCP extensions from RFC2324).
Well, thatâs neat and all; but does it really help that much? It kind of eliminates the need for idempotence keys and eliminates lost updates and the last read problem. But it doesnât help with the problem we started with: Getting the ID that the server used if we lose the initial response.
Well⊠do we really need the server to do that for us? Maybe it doesnât have to? Can we have the client-side just submit one of their own? Then they already have it before the server even gets started.
Trust The Awesomeness
At first, this sounds questionable. What if people intentionally create collisions? How can you keep IDs consistent without arbitrating on the server? How can I trust the client?
Then you start to realize suddenly that client-side IDs would demand significantly less from your database. By itself, it takes a huge load off for no other reason that the server no longer needs to maintain a consistent ID counter. Now that you think about it, could you do sharding without even consulting the server?
Next you realize that you can also use a client-side ID itself as the idempotency key. You can use conditional requests to instruct the server what to do if thereâs a conflict. They give you all youâll need to ensure that youâre manipulating the state you expect to be there.
Not only is this safer, you no longer have the risk of giving a duplicate idempotency key
You also get a clean recovery workflow for clients. Now retries are safe. And if you get âPrecondition Failedâ, you can do a GET to see if your create succeeded or if there was an intervening update you didnât expect.
If you only care about success or failure, you donât even need a GET. You can just do a HEAD and look at the etag. And if you do decide to do a GET you update the cache in-between consistently automatically.
Hey⊠you get caching in there, too. You can deploy something like Squid and your API will actually work with it, not against it. Bonus points if all of your microservices perform better because thereâs a giant, consistent, shared cache sitting in front of them.
Whoa. Now that I look at it, Squid even has facilities for me to make this work across our WAN. It turns out that ICP is about more than clowns (i.e. Inter-Cache Protocol >> Insane Clown Posse).
This improves everything!
UUIDs to the Rescue!
Theoretically, if you have them use UUIDs, then collisions are statistically unlikely.
Of course, there are so many to choose from!:
- UUID v1 ensures no collisions by using timestamps and MAC addresses.
- Nobody talks about UUID v2. Itâs like IPv5. We just skip over it.
- UUID v3 lets us throw in a ânamespaceâ and âidentifierâ to hash for more uniqueness.
- UUID v4 lets us just go with pure randomness.
- UUID v5 lets us use a stronger hash function than UUID v3⊠progress!
- UUID v6 lets us basically do UUID v1 again, but now they sort by time!
- UUID v7 sort, too; but theyâre random like UUID v4.
- UUID v8⊠well⊠it can be anything we want!
UUIDs to the Rescue?
Well, okay⊠it turns out all of those break in one way or another.
- MAC addresses get duplicated all the time in VM clusters and containers.
- Lots of embedded systems reset to a predictable date on bootup until NTP or something kicks in.
- In tests itâs even worse, because now some UUID typesl generate the same UUID in each test because they seed the RNG predictably.
- Or maybe it works but your test are nondeterministic because of time values.
- But then you mock it out and now you broke the rest of the UUID generation, too.
- Even with pure randomness, youâre something stuck with a busted client.
Oof, now that Iâm trying to use them, itâs a real pain.
- They canât be used in some contexts that donât like leading numbers or hyphens.
- Theyâre leaking my MAC address.
- Theyâre leaking when I created them with that timestamp, too.
- Even the sorting is killing my database because it causes current updates to hammer a tiny spot in my BTree index.
- Sorting is also making it easier for someone who might try to cause a collision, too.
So they perform worse, theyâre bad for privacy, they might be bad for security, embedded systems cause problems, testing gets weird, and now every client can naively create problems with bad PRNGs.
Letâs Look At Scalable Cloud Databases for Inspiration
Hmmm, MongoDB has an ObjectID() function! What do the docs say it does?
- A timestamp
- Some randomness
- A client-side session counter
This sounds familiarâŠ
So Thatâs What That CUID Manifesto Was Rambling About!
And now youâve got CUID:
- Leading letters for easier use in more places.
- Strong randomness.
- Uses client info but hashes it for privacy.
- Salting also gives us some security, too.
- Lack of order helps performance.
- Has that session counter in there, too.
CUID gives you a client-side key that you can generally trust. Youâre freed from dealing with idempotency keys. You get consistent updates in the face of concurrent changes. You can prevent double-creation or deletion. The server is no longer a bottleneck for ID creation. You can shard statelessly. You get potentially global caching almost for free.
Maybe the real treasure was the HTTP features we met along the way!
Conclusion
So there you have it. Itâs not about the server generating better UUIDs for IDs.
Itâs about a system where it doesnât have to. Itâs an entirely different paradigm. Itâs about shared-nothing scaling. Itâs about correct functioning despite malicious or very broken clients. Itâs about keeping your data consistent. Itâs about robustness, resilience, and recovery. Itâs about security and privacy.
But, mostly, itâs about building the kind of systems that are possible if youâre willing to question the assumptions that everyone else is making.
P.S. Thank you for coming to my TED talk.
P.P.S. No, I didnât use any AI to generate this. I write like this. Always haveâjust⊠em-dashes⊠everywhere.
P.P.P.S. Itâs 4am and Iâm very tired. I hope this makes any sense at all.