So from the PostgreSQL side, it can be demonstrated that ON CONFLICT ... DO UPDATE
is not a no-op.
- create a small demo table:
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct 6 01:22:05 PM PDT 2022]
> CREATE TABLE demo
(
id integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
text_value text NOT NULL
);
CREATE TABLE
Time: 12.746 ms
- Insert a single row into the table; We’ll do this in a transaction block so we can query the transaction id as well.
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct 6 01:22:23 PM PDT 2022]
> BEGIN;
BEGIN
Time: 0.775 ms
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct 6 01:22:33 PM PDT 2022]
> SELECT txid_current();
txid_current
--------------
8655
(1 row)
Time: 1.083 ms
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct 6 01:22:40 PM PDT 2022]
> INSERT INTO demo (text_value) VALUES ('demo') RETURNING *;
id | text_value
----+------------
1 | demo
(1 row)
INSERT 0 1
Time: 1.315 ms
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct 6 01:22:51 PM PDT 2022]
> COMMIT;
COMMIT
Time: 2.317 ms
- Let’s query the table, but we’ll include the system columns
xmin
(The identity (transaction ID) of the inserting transaction for this row version.) and ctid
(The physical location of the row version within its table.)
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct 6 01:22:59 PM PDT 2022]
> SELECT xmin, ctid, id, text_value FROM demo;
xmin | ctid | id | text_value
------+-------+----+------------
8655 | (0,1) | 1 | demo
(1 row)
Time: 1.145 ms
- Now that we have a row and understand the system perspective of that row, let’s do an
INSERT ... ON CONFLICT
in a way that’s similar to what I believe the Ecto query to produce. Again, we’ll do this in a transaction so we can capture some of the transaction details.
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct 6 01:23:08 PM PDT 2022]
> BEGIN;
BEGIN
Time: 0.778 ms
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct 6 01:23:18 PM PDT 2022]
> SELECT txid_current();
txid_current
--------------
8656
(1 row)
Time: 0.746 ms
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct 6 01:23:25 PM PDT 2022]
> INSERT INTO demo
( id, text_value )
VALUES
( 1, 'demo' )
ON CONFLICT (id) DO UPDATE SET id = excluded.id, text_value = excluded.text_value
RETURNING *;
id | text_value
----+------------
1 | demo
(1 row)
INSERT 0 1
Time: 1.141 ms
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct 6 01:23:35 PM PDT 2022]
> COMMIT;
COMMIT
Time: 2.014 ms
- Now let’s query the table again, and again with the system columns. Note that we’re not filtering the table and we only have one row which is identical in the
id
and text_value
user columns.
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct 6 01:23:48 PM PDT 2022]
> SELECT xmin, ctid, id, text_value FROM demo;
xmin | ctid | id | text_value
------+-------+----+------------
8656 | (0,2) | 1 | demo
(1 row)
Time: 1.237 ms
As we can see, even though the record would appear to have not changed to a user it’s in fact a new record (all PostgreSQL updates can be thought of as a DELETE/INSERT combination). And if we think about it, this makes sense. PostgreSQL generating apparent data changes without actually doing anything (a no-op) wouldn’t be desirable if you were needing logic that included the transaction details… since our “upsert” query happened within the context of a transaction it makes sense that anything PostgreSQL tells us it did would have to also reflect the effective transaction id.
Naturally this doesn’t cover the Ecto side, and I don’t have a good environment for demonstrating from Elixir right now; but I did take a quick look at the source code and I’m pretty sure the on_conflict: {:replace, [:a, :b]}
construct creates an upsert query similar to my test query.
So the moral of the story is don’t do no-change updates to the database unless you’re consciously accepting the additional table bloat, WAL updates, etc. that come with that.