Table with binary_id - can I trust the "inserted_at" for precise ordering of records?

hubertlepicki · July 20, 2017, 12:57pm

I have PostgreSQL table with binary_id (UUID) as id column. I also have timestamps().

I am using Ecto to fetch and insert the records. I want to fetch the records in precisely the order I have inserted them in.

If I was using numeric, auto-generated IDs, I think I would be pretty much sure that ordering by ID would be 100% error proof, but I have random UUIDs as my primary key.

Can I trust order by inserted_at asc to return records in order I have inserted them in, or do you recommend I should add a sequence column?

NobbZ · July 20, 2017, 1:10pm

Add another column that reflects order of insert, time stamps have only a single second of granularity, but even if they are more granular, you might insert two data sets close enough in time, that the timestamps will be the same for both.

edit

I have to say though that order of insertion is often not as important as the developer thinks it were, and if really order is important one should totally rely on the DB managing this column.

Also what is more important to you? Order of insertion into db or order of incomming requests that cause insertion into db? If the latter, I’d go for something self implemented and using mono tonic time as early as possible in the plug chain to carry the time for the request around.

hubertlepicki · July 20, 2017, 1:35pm

Ah yes. Thank you so much, I think you pointed me towards correct direction. I need to move that sequence generation up front - as you said, I am interested in arrival date of event, and not in the precise moment I have inserted it in. So I’ll need to assign each event unique, sequential ID prior to processing it and prior to sending it to the database.

outlog · July 20, 2017, 2:09pm

you might want to look at using snowflake or ksuid - https://segment.com/blog/a-brief-history-of-the-uuid/

OvermindDL1 · July 20, 2017, 2:24pm

That is changeable, I have mine down to millisecond resolution.

NobbZ · July 20, 2017, 3:29pm

Even that might be not fine enough. Even if we had a timestamp with a granularity that is even finer than the CPUs cycle, we might get the same timestamp in a distributed system.

Thats why I explicitely mentioned erlangs monotonic time, which should also be monotonicaly increase even on a distributed system with multiple nodes. And since it is managed by the ERTS I think it is faster and more accurate than any kind of state holding Agent, GenServer or whatever…

OvermindDL1 · July 20, 2017, 3:43pm

I’m not sure that is distributedly synched actually…

If you want a distributed incremental ID, I’d probably have a central authority that slaves out sets of numbers to nodes (split it into, say, parts of 1000 or however many max nodes you ever expect then they pull from that until it elapses) to slaves on each node that the local system could pull from. I think snowflake does something like that.

NobbZ · July 20, 2017, 3:54pm

I always understood it as synchronized, but after referring to the manual again I can’t find anything that could me make think that… Damn fading memories…