How to ensure a gapless auto-incremental id field when using PostgreSQL

I have noticed that the primary_key field (id field) of tables in PostgreSQL doesn’t guarantee auto-incremental values. The rows keys can be for example 1, 2, 3, 5, 6, 9, 10. This is related to the way PostgreSQL works and due to for example validation errors when creating new entities.

My question, how to avoid that? How to make sure that there would be no gaps in the id field? Thank you.

What if you were to create rows with the ids of 1, 2 and 3. You then delete id 2. Are you expecting your next insert to have id 2?

It should be max id + 1, that is to say: 4.

This sounds like an XY problem. Why do you want no gaps? What are you trying to accomplish?

1 Like

That’s exactly what postgresql is doing. If you have gaps then either you deleted something, or you ran things like a transaction or so that rolled back (each row in a transaction needs an ID, but it needs a unique one in case of other running transactions as well, so it gets incremented even on rollbacks), etc… Anything that acquires a new ID will increment the max id, even if it doesn’t use it.

What if I do the insert that gets ID 4 in a transaction that doesn’t commit for several minutes? What ID should inserts in other transactions taking place at the same time get? What if the transaction where I inserted ID 4 eventually rolls back?

“Max ID” is not a straightforward concept once MVCC and transactions are involved. See the notes in the Postgres docs for CREATE SEQUENCE for additional thoughts. Big takeaway:

Because nextval and setval calls are never rolled back, sequence objects cannot be used if “gapless” assignment of sequence numbers is needed. It is possible to build gapless assignment by using exclusive locking of a table containing a counter; but this solution is much more expensive than sequence objects, especially if many transactions need sequence numbers concurrently.

(emphasis mine)

3 Likes

Maybe the simplest solution is to simply migrate to MySQL.

That could introduce potentially more issues. Especially that if this particular problem will be solved maybe the trade offs will not worth it.

Instead would you describe why the gaps are a problem?
Maybe you could reduce the gaps by more validations in the application?

1 Like

I don’t really know how MySQL is different in this case, but it seems to me that it won’t fill ever at least the gaps due to records deletion. I think the way those SQL engines work is optimal to make quick inserts instead of checking first, gaps of freed ids.

One tricky way I can think about, If the behaviour you want is really important, would be generating the ids yourself based on the current max ID, then if insert fails due to another concurrent insert, you’ll re-acquire the new max ID and retry the insert until it gets done.

2 Likes

You can use a separate counter table and either a Postgres function or a Common Table Expression (CTE) to update the counter value during insert into your target table.

Using a Postgres function

  1. Create a “counter” table containing a single “last value” row:

    CREATE TABLE counter (last_value integer NOT NULL);
    INSERT INTO counter (last_value) VALUES (0);
    
  2. Create a get_next_id function:

    CREATE OR REPLACE FUNCTION get_next_id(countertable regclass, countercolumn text) RETURNS integer AS $$
    DECLARE
        next_value integer;
    BEGIN
        EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
        RETURN next_value;
    END;
    $$ LANGUAGE plpgsql;
    
    COMMENT ON get_next_id(countername regclass) IS 'Increment and return value from integer column $2 in table $1';
    
  3. Create your desired table to use the gapless sequence:

    CREATE TABLE example (id integer NOT NULL, value text NOT NULL);
    CREATE UNIQUE INDEX events_pkey ON example(id);
    

Usage:

INSERT INTO example(id, value)
VALUES (get_next_id('counter','last_value'), 'example');

Using a CTE

Use a separate “counter” table and a Common Table Expression (CTE) to update the counter value during insert into your target table.

  1. Create a “counter” table containing a single “last value” row:

    CREATE TABLE counter (last_value integer NOT NULL);
    INSERT INTO counter (last_value) VALUES (0);
    
  2. Create your desired table to use the gapless sequence:

    CREATE TABLE example (id integer NOT NULL, value text NOT NULL);
    CREATE UNIQUE INDEX events_pkey ON example(id);
    

Usage:

WITH
  counter AS (
    UPDATE counter SET last_value = last_value + 1
    RETURNING last_value
  )
INSERT INTO example (id, value)
SELECT counter.last_value, 'example';

With both approaches the UPDATERETURNING query will block any other update to the counter table, thus guaranteeing a gapless sequence. You can test this by attempting to run the query concurrently using BEGIN; but not committing. You’ll notice that the second query is blocked until the first commits (or aborts).

This also means that if the first query’s transaction is aborted (using ROLLBACK) the second query can continue and will be assigned the next value, no gaps. Unlike using a traditional Postgres sequence which is not transactional.

A caveat is that if you delete a row from the table you will then have gaps. This can be prevented with a rule to prevent deletion from the table:

CREATE RULE no_delete_example AS ON DELETE TO example DO INSTEAD NOTHING;
5 Likes

Again, what is your actual use case here?

3 Likes

You’d have similar issues on MySQL as well. PostgreSQL’s style is what has to be done when your sql server supports transactions.

And then you conflict when transactions happen, so one will fail when multiple happen at the same time.

2 Likes

No, the UPDATERETURNING statement supports concurrent transactions, but will block until the first update completes. After it commits or aborts, the next transaction can continue with the next value, no gaps. This behaves like a single threaded writer. It’s different from sequences which are assigned outside of a transaction for high performance.

2 Likes

Thank you very much. I have done something similar with Phoenix updating the sequence but doing this with a PostgreSQL procedure seems more appropriate.

UPDATE ... RETURNING will, but rarely do I ever update a single table at a time, transactions are almost always necessary for data consistency unless it’s an exceptionally simple database that isn’t using the database features… In addition, the primary key is rarely updated, especially as manually as that would be done. Once transactions are involved this will cause major issues either in terms of handling no concurrency or values getting out of sync.

1 Like