How to ensure a gapless auto-incremental id field when using PostgreSQL

acrolink · October 24, 2019, 6:15pm

I have noticed that the primary_key field (id field) of tables in PostgreSQL doesn’t guarantee auto-incremental values. The rows keys can be for example 1, 2, 3, 5, 6, 9, 10. This is related to the way PostgreSQL works and due to for example validation errors when creating new entities.

My question, how to avoid that? How to make sure that there would be no gaps in the id field? Thank you.

Ankhers · October 24, 2019, 6:20pm

What if you were to create rows with the ids of 1, 2 and 3. You then delete id 2. Are you expecting your next insert to have id 2?

acrolink · October 24, 2019, 6:23pm

It should be max id + 1, that is to say: 4.

benwilson512 · October 24, 2019, 6:29pm

This sounds like an XY problem. Why do you want no gaps? What are you trying to accomplish?

OvermindDL1 · October 24, 2019, 6:45pm

That’s exactly what postgresql is doing. If you have gaps then either you deleted something, or you ran things like a transaction or so that rolled back (each row in a transaction needs an ID, but it needs a unique one in case of other running transactions as well, so it gets incremented even on rollbacks), etc… Anything that acquires a new ID will increment the max id, even if it doesn’t use it.

al2o3cr · October 24, 2019, 7:13pm

What if I do the insert that gets ID 4 in a transaction that doesn’t commit for several minutes? What ID should inserts in other transactions taking place at the same time get? What if the transaction where I inserted ID 4 eventually rolls back?

“Max ID” is not a straightforward concept once MVCC and transactions are involved. See the notes in the Postgres docs for CREATE SEQUENCE for additional thoughts. Big takeaway:

Because nextval and setval calls are never rolled back, sequence objects cannot be used if “gapless” assignment of sequence numbers is needed. It is possible to build gapless assignment by using exclusive locking of a table containing a counter; but this solution is much more expensive than sequence objects, especially if many transactions need sequence numbers concurrently.

(emphasis mine)

acrolink · October 25, 2019, 8:35am

Maybe the simplest solution is to simply migrate to MySQL.

preciz · October 25, 2019, 9:09am

That could introduce potentially more issues. Especially that if this particular problem will be solved maybe the trade offs will not worth it.

Instead would you describe why the gaps are a problem?
Maybe you could reduce the gaps by more validations in the application?

Kurisu · October 25, 2019, 9:57am

I don’t really know how MySQL is different in this case, but it seems to me that it won’t fill ever at least the gaps due to records deletion. I think the way those SQL engines work is optimal to make quick inserts instead of checking first, gaps of freed ids.

One tricky way I can think about, If the behaviour you want is really important, would be generating the ids yourself based on the current max ID, then if insert fails due to another concurrent insert, you’ll re-acquire the new max ID and retry the insert until it gets done.

slashdotdash · October 25, 2019, 11:51am

You can use a separate counter table and either a Postgres function or a Common Table Expression (CTE) to update the counter value during insert into your target table.

Using a Postgres function

Create a “counter” table containing a single “last value” row:

CREATE TABLE counter (last_value integer NOT NULL);
INSERT INTO counter (last_value) VALUES (0);

Create a get_next_id function:

CREATE OR REPLACE FUNCTION get_next_id(countertable regclass, countercolumn text) RETURNS integer AS $$
DECLARE
    next_value integer;
BEGIN
    EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
    RETURN next_value;
END;
$$ LANGUAGE plpgsql;

COMMENT ON get_next_id(countername regclass) IS 'Increment and return value from integer column $2 in table $1';

Create your desired table to use the gapless sequence:

CREATE TABLE example (id integer NOT NULL, value text NOT NULL);
CREATE UNIQUE INDEX events_pkey ON example(id);

Usage:

INSERT INTO example(id, value)
VALUES (get_next_id('counter','last_value'), 'example');

PostgreSQL gapless sequences (using a function)

Using a CTE

Use a separate “counter” table and a Common Table Expression (CTE) to update the counter value during insert into your target table.

Create a “counter” table containing a single “last value” row:

CREATE TABLE counter (last_value integer NOT NULL);
INSERT INTO counter (last_value) VALUES (0);

Create your desired table to use the gapless sequence:

CREATE TABLE example (id integer NOT NULL, value text NOT NULL);
CREATE UNIQUE INDEX events_pkey ON example(id);

Usage:

WITH
  counter AS (
    UPDATE counter SET last_value = last_value + 1
    RETURNING last_value
  )
INSERT INTO example (id, value)
SELECT counter.last_value, 'example';

With both approaches the UPDATE … RETURNING query will block any other update to the counter table, thus guaranteeing a gapless sequence. You can test this by attempting to run the query concurrently using BEGIN; but not committing. You’ll notice that the second query is blocked until the first commits (or aborts).

This also means that if the first query’s transaction is aborted (using ROLLBACK) the second query can continue and will be assigned the next value, no gaps. Unlike using a traditional Postgres sequence which is not transactional.

A caveat is that if you delete a row from the table you will then have gaps. This can be prevented with a rule to prevent deletion from the table:

CREATE RULE no_delete_example AS ON DELETE TO example DO INSTEAD NOTHING;

benwilson512 · October 25, 2019, 12:14pm

Again, what is your actual use case here?

OvermindDL1 · October 25, 2019, 4:48pm

You’d have similar issues on MySQL as well. PostgreSQL’s style is what has to be done when your sql server supports transactions.

And then you conflict when transactions happen, so one will fail when multiple happen at the same time.

slashdotdash · October 25, 2019, 5:19pm

No, the UPDATE … RETURNING statement supports concurrent transactions, but will block until the first update completes. After it commits or aborts, the next transaction can continue with the next value, no gaps. This behaves like a single threaded writer. It’s different from sequences which are assigned outside of a transaction for high performance.

acrolink · October 25, 2019, 5:24pm

Thank you very much. I have done something similar with Phoenix updating the sequence but doing this with a PostgreSQL procedure seems more appropriate.

OvermindDL1 · October 25, 2019, 5:39pm

UPDATE ... RETURNING will, but rarely do I ever update a single table at a time, transactions are almost always necessary for data consistency unless it’s an exceptionally simple database that isn’t using the database features… In addition, the primary key is rarely updated, especially as manually as that would be done. Once transactions are involved this will cause major issues either in terms of handling no concurrency or values getting out of sync.