Persisting global dynamic variables

bdarla · December 9, 2023, 9:22am

Dear all,

I guess this is related with ecto, though other solutions might be possible.

I want to generate custom unique identifiers for records in a table (using default ecto Postgresql). The current context is a Phoenix application.

My idea is to create a “global” table, with a single row (not sure if I can enforce this), wherein each column stores a unique global value for specific purposes. In my current problem I only need a single column to store the latest generated unique identifier (or part of it, e.g. an integer).

This value in the specific row and column of the “global” table shall be part of a Multi transaction that reads the latest value, creates a new entry in a different table where I want to store the actual data and update the latest value in the “global” table.

What is your opinion? Is this too complex? Is there any simpler approach? I want to decouple it from the default incremental keys of Postgresql to customise it to my needs.

benwilson512 · December 9, 2023, 9:59am

This approach will suffer performance wise a fair bit, since multiple processes cannot update the same row at the same time. There’s a fair bit to unpack here:

Can you motivate why you need this approach instead of regular sequences?
With your custom sequences table, why not one row per “id generator” instead of doing it column wise?
Can you elaborate on the overall use case more?

dimitarvp · December 9, 2023, 10:07am

Why though, what’s wrong with them?

Do you have to store a monotonically increasing ID in several different DB tables, or something similar?

bdarla · December 9, 2023, 1:56pm

Dear Ben and Dimitar,

Thank you for your prompt responses!

The reason I am thinking of such a solution is mainly for providing a freedom in the business logic regarding such identifiers that are presented to users as unique ids.

The usage of this particular table is for providing a globally unique identifier to documents that are presented to users (not only an internal database concern).

I agree that for the moment, I could use Postgresql monotonically increasing identifiers (for prototyping) but I thought that a decoupled solution could be more interesting in the future. The identifiers shown to the users are not plain numbers but they do involve an integer in their “body”.

In order to provide a very relevant example, think of patent application numbers that each patent office assigns to new patents, e.g. US20210123456 or EP20221090 or WO2021654321.

And to just clarify a question by Ben, I referred to the column just to exemplify that, generally speaking, a system might require a larger number of such global dynamic variables. Some examples may be tracking only the largest number of concurrent users ever existed, or anything that you care about a single value for the whole business (just making up some possibilities, not that I need any of these at the moment). Each column (of the unique single tow table) stores the corresponding value of such a global variable that can be updated.

With respect to the performance penalty, I want to believe that updates/writes that require such a complex transaction are relatively seldom. The typical operation is key based retrieval of content from other “normal” tables, without any interference with such a “global” table.

The requirement for a single row simplifies the accessing and updating of such global values.

adw632 · December 9, 2023, 2:52pm

You can generate uuid7 globally with no database or coordination, they are 128bit numbers and designed to make collisions astronomically unlikely without any central control or registration.

https://uuid7.com/

Postgres extension: pg_uuidv7: Create UUIDv7 values in Postgres / PostgreSQL Extension Network

Elixir implementation: UUIDv7 — uuidv7 v0.2.1

If you want shorter identifiers then consider nano IDs, and the exlir package.

bdarla · December 9, 2023, 2:54pm

This is exactly the solution I have in place and working right now. The drawback is that they are longer than I wish to consider them (personally) as user-friendly.

D4no0 · December 9, 2023, 2:55pm

Have you thought about making your own custom postgres autoincrement function? I think that it is best to delegate concurrency tasks to db, as this is its job.

bdarla · December 9, 2023, 2:56pm

I guess we can depart the discussion from the particular variable I am interested in and just discuss what solution people would use for such a kind of global variables that are dynamically updated and how to efficiently persist them.

bdarla · December 9, 2023, 2:58pm

Yes, I also thought of that. I was reluctant (at least until now) because I have no experience in such a direction.
But, as I mention above, shall we generalise this discussion to any kind of such global variables that change?

D4no0 · December 9, 2023, 3:01pm

There should be nothing special as long as you keep your functions simple, I know companies that have 80% of their business logic as sql procedures .

LostKobrakai · December 9, 2023, 3:16pm

How sure are you those are actually as monotonically increasing or gapless as you think they are? I‘d imagine those systems to deal with compensations for errors after the fact far more than us developers would like to hear.

bdarla · December 9, 2023, 3:21pm

It is uniqueness that is essential and not necessarily monotonically increasing. Also, if deletions are allowed then of course gaps will be necessarily present.
But, how would you address the more general storage problem, ignoring the particular identifiers?

bdarla · December 9, 2023, 3:26pm

For the record, I also explored a single (global) Agent based solution for the uniqueness of the identifiers in the sense that Dave Thomas implements the Dictionary in his Elixir course.
The problem with using only an Agent is the lack of persistence.

al2o3cr · December 9, 2023, 4:29pm

Postgres has literally this - sequences:

They are most-commonly used as auto-incrementing primary keys, but they can be used independently.

There are several gotchas to be aware of (more details under “Notes” on that page):

calls to nextval are not rolled back even if the transaction they were made in is. If your system has lots of rollbacks, the sequence will have lots of gaps
nextval still requires some synchronization, so it’s not particularly fast. The cache option can reduce this overhead, but at the cost of sequence values not necessarily being monotonically-increasing across sessions

benwilson512 · December 9, 2023, 4:46pm

Right so overall I would separate this problem into two separate concerns:

you want to generate a unique value.
you want to make that value semantically nice.

You can use regular sequences for (1) and this is what you should use as your primary key.

From there you can also generate (2) by combining the ID with whatever other stuff is helpful. Alternatively, you can use custom sequences that you reset each month / year/ whatever and then combine with some part of a date prefix.

dimitarvp · December 9, 2023, 7:28pm

Well there’s also GitHub - ulid/spec: The canonical spec for ulid which too is 128-bit but at least it doesn’t have dashes like UUID.

I too will vouch for NanoID btw. I’ve never had a conflict, though the code generated like 3-4 per hour only.

dimitarvp · December 10, 2023, 9:47am

Oh, another option are typeid-s, of which this is one implementation: GitHub - sloanelybutsurely/typeid-elixir: Elixir implementation of TypeIDs: type-safe, K-sortable, and globally unique identifiers inspired by Stripe IDs

adw632 · December 10, 2023, 8:07pm

Are type-ids what eventually became uuid7?

dimitarvp · December 10, 2023, 8:55pm

Wondered the same but never knew. Still, type IDs can have prefixes like order- or customer- and UUIDv7 doesn’t have that, so loosely the answer should be no.