UUID -- wisdom and advice

For note, PostgreSQL is actually quite fine with UUID’s. It is best with UUID v1/v2, which are incrementing. Sadly, Ecto is hard-coded to UUIDv4, which is Random. UUID v1/v2 is generated from system data, including time, and PostgreSQL has built-in methods for generating them. UUIDv4 is generated from a cryptographic random generator, collisions are possible with UUIDv4 (so be aware of that if you merge distinct DB’s) where UUIDv1/v2 are safer in regards to that. Even then, the chance of a collision with UUIDv4 is very very low, just possible. I wish Ecto gave the option for UUIDv1/v2 or something else for UUID’s instead of hard-coding to server-generated-passed-to-DB UUID’s when PostgreSQL can generate UUID v1/v2 just fine on its own, and it is slightly more efficient. PostgreSQL also has generators for UUID v4 like Ecto uses, just Ecto does not use it…

However the efficient change is minor, don’t worry about using even UUID v4’s in PostgreSQL. In MySQL it is a lot more of an issue, but not in PostgreSQL.

6 Likes

If you go the snowflake route then you can have PG generate the ID for you. There’s a good description of the method and algorithm at http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/

2 Likes

I’ll take a look – thanks!

Seems like my previous message is ignored, maybe the following helps.
A con of a guid can be a performance penalty as explained in the next citation. But there is a solution; the previously named comb guid or sequential guid or as overmind named the incremental uuid.
Generating the (seq.) guid on the client has as pro that you do not need a round trip to the server to get this id. You can even create related records (with this guid as pk - fk) on the client. Implemented and used it for years in production (except that I did not know of the seq guid at that time).

Guid primary keys are a natural fit for many development scenarios, such as replication, or when you 
need to generate primary keys outside the database. Guid primary keys have several benefits, 
such as:

Uniqueness across every table, every database and every server.
You can generate the values without having a roundtrip to the database.
Most replication scenarios require Guid columns.
One of the problems with Guid primary keys is that they are usually based on a
random number generator and this could cause performance issues when a clustered
index is defined on the primary key Guid column (default in SqlServer) or when a Guid 
column is indexed. When a large number of records are stored in a table, its PK index 
becomes very inefficient when the primary key is of type Guid. The reason for that is
the fact that indexes need to have the records in a sort order by the index column. 
Sorting random Guids is not a very efficient operation. It takes much more computing 
cycles to find the correct place where a new guid should be stored among 
100000000 Guids that are already sorted.

(http://docs.telerik.com/data-access/feature-reference/api/context-api/id-generators/feature-ref-api-context-api-client-seq-guids)

For js I found this generator some time ago: https://github.com/kelektiv/node-uuid
Anyone aware of others by the way?

edit:


https://www.npmjs.com/package/uuid
found with a quick search.

2 Likes

Thanks Stefan! This is very helpful. I am still mulling over which route I should take – integer or GUID. I’m quite drawn to the latter but have not yet tried it … hence a bit hesitant to take the jump. The GUID’s with an ordering property (UUID 1) seem very attractive … I wish they were implemented in Ecto.

Thanks again!

Sorry to threadjack, but I have a related question that Im not sure requires its own thread :slight_smile:

What are the thoughts on mixing PK styles? So most things would have UUID as the PK, but what about for tables that are there purely to support reporting - a list of actions with a timestamp for example - could they have just an INT PK? There would still be a FK to the the UUID PK of the owning object, but it would save space on table that could be pretty big.

Is this something that is frowned upon? Or just personal preference?

I don’t believe there are any drawbacks on using different kinds of primary keys for different tables. I think it is very smart to mix them, allowing you to get one set of guarantees for one table and another for another.

1 Like

Good to have some confirmation, I though it was fine, but as this thread was here I thought Id ask!

One reason for using UUIDs (mentioned here) is the lack of ability to enumerate a particular collection based on an entity’s ID, disclosing a (approximate, modulo deletions) lower bound on the number of entities in the system, which may be undesirable for business reasons.

If that is the main issue, and the system will be synchronised through one database and does not need client-generated IDs, would it be better to use sequential IDs internally and simply generate random slugs for entities (à la YouTube)? Are there disadvantages to going that route instead of just assigning everything a UUID and being done?

1 Like

This is a very good question!

I’ve been researching UUIDs for a new project I will be working on, I won’t echo the points already posted but wanted to share a good article on the subject once you get to scale and thinking about UUIDs.

Sharding IDs at Instagram - it sounds like this isn’t something you’d be too worried about just yet (and Instagram’s case is fairly unique), but I thought it was a good writeup of some of the things they’ve tried and the reasoning why they ended up doing what they did.

There is a decent discussion about the article on Hacker News as well.

Here https://www.youtube.com/watch?v=rEKweR5pX-k an exceptionally userfriendly (in my opininion) UI in action where an order + orderlines are created on the client with minimal effort. When there is an order + an orderlines table on the backend: there is no commit of separate records, the whole order (+ orderlines) is committed at once. Gives the possibility to check on an eventual cardinality constraint also (no order should be allowed without at least f.e. one orderline). Creation of pk-fk on the client helps here.

1 Like

Yes, I think that the fact that UUIDs can be generated outside of the database (rather than with an AUTO INCREMENT or similar statement inside the database) makes it a lot easier to write hierarchical collections to the database at once. Of course, care still needs to be taken to handle collisions, especially when users can choose the UUIDs that are sent to the database themselves.

UUID v1/v2 would be better for that then, no collisions. :wink:

Yes… if you assume that you can always trust the client. Which your server application should of course never do :exclamation::exclamation: . To write my remark differently: Please do make sure that your application checks for malicious user input. Input containing colliding identifiers is yet one more possibility to consider.

1 Like

Oh always! I’d never let the client make the primary key regardless! By client I thought you meant the server software ‘client’ to the database, not a user client. ^.^
For user defined ID’s I namespace them or prefix them or so forth and put them in their own column/table that points to the real primary key. I never ever trust users with PK input, ever…

1 Like


For those interested in validation / security. i’m not going to copy / paste parts of the article in my mail now. Grrr ;-).

2 Likes

Likie too for the (my!! ;-)) client-generated uuid OvermindDL1, greatest of all? :wink:
(“keep it humorless please, no mean jokes and personal notes, we like to keep it nice-looking here”)
Never slip of the mind.