I guess this is related with ecto, though other solutions might be possible.
I want to generate custom unique identifiers for records in a table (using default ecto Postgresql). The current context is a Phoenix application.
My idea is to create a “global” table, with a single row (not sure if I can enforce this), wherein each column stores a unique global value for specific purposes. In my current problem I only need a single column to store the latest generated unique identifier (or part of it, e.g. an integer).
This value in the specific row and column of the “global” table shall be part of a Multi transaction that reads the latest value, creates a new entry in a different table where I want to store the actual data and update the latest value in the “global” table.
What is your opinion? Is this too complex? Is there any simpler approach? I want to decouple it from the default incremental keys of Postgresql to customise it to my needs.
This approach will suffer performance wise a fair bit, since multiple processes cannot update the same row at the same time. There’s a fair bit to unpack here:
- Can you motivate why you need this approach instead of regular sequences?
- With your custom sequences table, why not one row per “id generator” instead of doing it column wise?
- Can you elaborate on the overall use case more?
Why though, what’s wrong with them?
Do you have to store a monotonically increasing ID in several different DB tables, or something similar?
Dear Ben and Dimitar,
Thank you for your prompt responses!
The reason I am thinking of such a solution is mainly for providing a freedom in the business logic regarding such identifiers that are presented to users as unique ids.
The usage of this particular table is for providing a globally unique identifier to documents that are presented to users (not only an internal database concern).
I agree that for the moment, I could use Postgresql monotonically increasing identifiers (for prototyping) but I thought that a decoupled solution could be more interesting in the future. The identifiers shown to the users are not plain numbers but they do involve an integer in their “body”.
In order to provide a very relevant example, think of patent application numbers that each patent office assigns to new patents, e.g. US20210123456 or EP20221090 or WO2021654321.
And to just clarify a question by Ben, I referred to the column just to exemplify that, generally speaking, a system might require a larger number of such global dynamic variables. Some examples may be tracking only the largest number of concurrent users ever existed, or anything that you care about a single value for the whole business (just making up some possibilities, not that I need any of these at the moment). Each column (of the unique single tow table) stores the corresponding value of such a global variable that can be updated.
With respect to the performance penalty, I want to believe that updates/writes that require such a complex transaction are relatively seldom. The typical operation is key based retrieval of content from other “normal” tables, without any interference with such a “global” table.
The requirement for a single row simplifies the accessing and updating of such global values.
You can generate uuid7 globally with no database or coordination, they are 128bit numbers and designed to make collisions astronomically unlikely without any central control or registration.
Postgres extension: pg_uuidv7: Create UUIDv7 values in Postgres / PostgreSQL Extension Network
Elixir implementation: UUIDv7 — uuidv7 v0.2.1
If you want shorter identifiers then consider nano IDs, and the exlir package.
This is exactly the solution I have in place and working right now. The drawback is that they are longer than I wish to consider them (personally) as user-friendly.
Have you thought about making your own custom postgres autoincrement function? I think that it is best to delegate concurrency tasks to db, as this is its job.
I guess we can depart the discussion from the particular variable I am interested in and just discuss what solution people would use for such a kind of global variables that are dynamically updated and how to efficiently persist them.
Yes, I also thought of that. I was reluctant (at least until now) because I have no experience in such a direction.
But, as I mention above, shall we generalise this discussion to any kind of such global variables that change?
There should be nothing special as long as you keep your functions simple, I know companies that have 80% of their business logic as sql procedures .
How sure are you those are actually as monotonically increasing or gapless as you think they are? I‘d imagine those systems to deal with compensations for errors after the fact far more than us developers would like to hear.
It is uniqueness that is essential and not necessarily monotonically increasing. Also, if deletions are allowed then of course gaps will be necessarily present.
But, how would you address the more general storage problem, ignoring the particular identifiers?
For the record, I also explored a single (global) Agent based solution for the uniqueness of the identifiers in the sense that Dave Thomas implements the Dictionary in his Elixir course.
The problem with using only an Agent is the lack of persistence.
Postgres has literally this - sequences:
They are most-commonly used as auto-incrementing primary keys, but they can be used independently.
There are several gotchas to be aware of (more details under “Notes” on that page):
nextval are not rolled back even if the transaction they were made in is. If your system has lots of rollbacks, the sequence will have lots of gaps
nextval still requires some synchronization, so it’s not particularly fast. The
cache option can reduce this overhead, but at the cost of sequence values not necessarily being monotonically-increasing across sessions
Right so overall I would separate this problem into two separate concerns:
- you want to generate a unique value.
- you want to make that value semantically nice.
You can use regular sequences for (1) and this is what you should use as your primary key.
From there you can also generate (2) by combining the ID with whatever other stuff is helpful. Alternatively, you can use custom sequences that you reset each month / year/ whatever and then combine with some part of a date prefix.
Well there’s also GitHub - ulid/spec: The canonical spec for ulid which too is 128-bit but at least it doesn’t have dashes like UUID.
I too will vouch for NanoID btw. I’ve never had a conflict, though the code generated like 3-4 per hour only.
Are type-ids what eventually became uuid7?
Wondered the same but never knew. Still, type IDs can have prefixes like
customer- and UUIDv7 doesn’t have that, so loosely the answer should be no.