I wanted to get a reality check from the wise folks of the forum on the differences and pros/cons in uuid
vs. nanoid
. UUIDv4 has been my go-to for primary keys for a while, but the nanoid package is garnering some attention (there is an Elixir port). NanoID promises to save on real-estate: a human may be able to type one more easily because a NanoID can be shorter than a UUID (at least on the screen).
Just to make sure I’m getting my facts straight, my understanding is that a UUIDv4 ID is stored as binary data 128 bits long. It is often represented as an alpha-numeric string, e.g. fcfe5f21-8a08-4c9a-9f97-29d2fd6a27b9
, but this is just a human-readable view of the underlying binary data. So if I’m doing my math correctly, the usual representation is as 32 characters (discounting the hyphens), each represented by a 4-bit hexadecimal number (0, 1, 2, … a, b, c, d, e, f); 32 x 4 = 128 bits.
NanoIDs, on the other hand, seem to always be strings. So if we store a UUID representation (minus the hyphens, e…g. fcfe5f218a084c9a9f9729d2fd6a27b9
) as a literal string, it requires 256 bits because it is represented on disk as thirty-two 8 bit numbers, instead of thirty-two 4 bit hexadecimal numbers – i.e. storing the value as a string requires at least twice the space. I’ve seen this mistake made many times when a database schema uses a TEXT or CHAR column to store UUIDs instead of the native binary format… this mistake can really slow down indexing and queries.
So the question is: couldn’t we just offer a different VIEW on top of the existing UUIDv4? In other words, couldn’t we just represent those 128 bits differently to save screen real-estate? For example, if we choose an alphabet of a-z and digits 0 - 5, we would have 32 characters at our disposal and we could represent a 128 bit UUID using only 4 screen characters, e.g. pf3c
. Or if we wanted to expand our alphabet, e.g. to a-z, A-Z, 0-9, plus 2 more characters – that would bring us to 64 characters in our arsenal, and we could represent the 128 bit UUID using only 2 screen characters, e.g. Q3
. (This is just another way of saying “base-64 encoding”). Wouldn’t that make for nicer REST URLs? E.g. http://localhost/posts/Q3
instead of http://localhost/posts/fcfe5f21-8a08-4c9a-9f97-29d2fd6a27b9
Am I reasoning about this correctly? It feels like I’m missing something. Am I correct that UUIDv4 requires only 128 bits? So would it be useful to have a package that offered a custom and compact “view” of the UUID data? That way the database and everything else could stick to the tried-and-true UUID generation and support under the hood, but if humans were involved, a shorthand could be used to provide an easy-to-type short-hand of the UUID (e.g. using some base-64 or base-32 scheme). This is more or less the idea behind URL shorteners, it’s just a lot simpler when you only have to represent 128 bits of data.
Am I looking at this the right way? Thanks for any thoughts.