Ecto `insert()` with `returning: true` on Postgres - request for clarification

silverdr · October 18, 2021, 11:35am

From another thread - making it a separate topic:

To make sure I understand correctly (because TFM seems ambiguous on this to me):

When there’s conflict (like a UNIQUE constraint being violated), does the insert/2 function return the conflicting record from the database, discarding the struct or changeset passed to it? What if multiple records conflicted with the passed data? Like when violating constraints on several fields?

Update:

Did some tests (using Postgres), and it seems that given a changeset, inserting which would violate unique constraint, Repo.insert(changeset, returning: true, on_conflict: :nothing) returns the data passed to it rather than the record from the database. Is this the expected behaviour? If yes then what’d be the way to obtain the existing record without performing a separate query for this purpose?

LostKobrakai · October 18, 2021, 12:59pm

Yes. To get the db record either do a query manually or use returning: true as additional option and make sure “something” is updated on conflict (often e.g. timestamps are good candidates).

silverdr · October 18, 2021, 2:17pm

Once more to make sure - is the fact that I used on_conflict: :nothing the reason for NOT receiving the record from the database? IOW - the function doesn’t do anything beyond translating from Elixir syntax to SQL, unlike e. g. Rails’ create_or_find_by, right?

LostKobrakai · October 18, 2021, 7:47pm

Exactly.

DavidRawk-Blake · October 6, 2022, 12:07am

Without doing a write, is there a way to get this?

silverdr · October 6, 2022, 1:38pm

Separate query, I reckon…

zzq · October 6, 2022, 2:16pm

The thing I generally do is use :replace with the fields in the unique constraint (the conflict target) instead of using :nothing. (e.g. if the table has a unique constraint on the pair of columns a and b, I do conflict_target: [:a, :b], on_conflict: {:replace, [:a, :b]}.) I know for a fact that when the on_conflict fires, doing that replace is a no-op. But it’s enough in order for the database to consider the record to be “updated” for the purposes of returning: true.

silverdr · October 6, 2022, 2:36pm

Right, but is this something that might change between implementations/versions? Or a documented behaviour that can be relied upon? I don’t say it isn’t. Just asking.

sbuttgereit · October 6, 2022, 2:53pm

One note about this technique. From the application point of view this is a no-op, but assuming I’m reading the ecto_sql code correctly, in the database I don’t believe it is a no-op. Assuming my recollection is correct PostgreSQL would see this effectively as an update. This is why you’d get a returning result but which also means, on disk, you’d get the current row marked as deleted and a new row created; naturally the new row would be from your point of view the same data and appear as the old row, but the database overhead of an actual update would be incurred… including the ramifications for table bloat, vacuum overhead, etc.

I’ll double check this a bit later when I have time, but am confident enough that the generated SQL would cause that to be the case to say something now.

silverdr · October 6, 2022, 5:01pm

That’d be my suspicion too. And the reason I asked whether it’s a [PostgreSQL] documented behaviour [that it’s a true :noop]

sbuttgereit · October 6, 2022, 8:42pm

So from the PostgreSQL side, it can be demonstrated that ON CONFLICT ... DO UPDATE is not a no-op.

create a small demo table:

127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct  6 01:22:05 PM PDT 2022]
> CREATE TABLE demo
(
    id         integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    text_value text NOT NULL
);
CREATE TABLE
Time: 12.746 ms

Insert a single row into the table; We’ll do this in a transaction block so we can query the transaction id as well.

127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct  6 01:22:23 PM PDT 2022]
> BEGIN;
BEGIN
Time: 0.775 ms
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct  6 01:22:33 PM PDT 2022]
> SELECT txid_current();
 txid_current 
--------------
         8655
(1 row)

Time: 1.083 ms
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct  6 01:22:40 PM PDT 2022]
> INSERT INTO demo (text_value) VALUES ('demo') RETURNING *;
 id | text_value 
----+------------
  1 | demo
(1 row)

INSERT 0 1
Time: 1.315 ms
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct  6 01:22:51 PM PDT 2022]
> COMMIT;
COMMIT
Time: 2.317 ms

Let’s query the table, but we’ll include the system columns xmin (The identity (transaction ID) of the inserting transaction for this row version.) and ctid (The physical location of the row version within its table.)

127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct  6 01:22:59 PM PDT 2022]
> SELECT xmin, ctid, id, text_value FROM demo;
 xmin | ctid  | id | text_value 
------+-------+----+------------
 8655 | (0,1) |  1 | demo
(1 row)

Time: 1.145 ms

Now that we have a row and understand the system perspective of that row, let’s do an INSERT ... ON CONFLICT in a way that’s similar to what I believe the Ecto query to produce. Again, we’ll do this in a transaction so we can capture some of the transaction details.

127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct  6 01:23:08 PM PDT 2022]
> BEGIN;
BEGIN
Time: 0.778 ms
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct  6 01:23:18 PM PDT 2022]
> SELECT txid_current();
 txid_current 
--------------
         8656
(1 row)

Time: 0.746 ms
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct  6 01:23:25 PM PDT 2022]
> INSERT INTO demo
    ( id, text_value )
VALUES
    ( 1, 'demo' )
ON CONFLICT (id) DO UPDATE SET id = excluded.id, text_value = excluded.text_value
RETURNING *;
 id | text_value 
----+------------
  1 | demo
(1 row)

INSERT 0 1
Time: 1.141 ms
127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct  6 01:23:35 PM PDT 2022]
> COMMIT;
COMMIT
Time: 2.014 ms

Now let’s query the table again, and again with the system columns. Note that we’re not filtering the table and we only have one row which is identical in the id and text_value user columns.

127.0.0.1(from musebms).postgres.scb.5432 [Thu Oct  6 01:23:48 PM PDT 2022]
> SELECT xmin, ctid, id, text_value FROM demo;
 xmin | ctid  | id | text_value 
------+-------+----+------------
 8656 | (0,2) |  1 | demo
(1 row)

Time: 1.237 ms

As we can see, even though the record would appear to have not changed to a user it’s in fact a new record (all PostgreSQL updates can be thought of as a DELETE/INSERT combination). And if we think about it, this makes sense. PostgreSQL generating apparent data changes without actually doing anything (a no-op) wouldn’t be desirable if you were needing logic that included the transaction details… since our “upsert” query happened within the context of a transaction it makes sense that anything PostgreSQL tells us it did would have to also reflect the effective transaction id.

Naturally this doesn’t cover the Ecto side, and I don’t have a good environment for demonstrating from Elixir right now; but I did take a quick look at the source code and I’m pretty sure the on_conflict: {:replace, [:a, :b]} construct creates an upsert query similar to my test query.

So the moral of the story is don’t do no-change updates to the database unless you’re consciously accepting the additional table bloat, WAL updates, etc. that come with that.

silverdr · October 6, 2022, 10:43pm

Thank you very much for the insight and thorough explanations. I’d like to be proven wrong but I don’t think Ecto actually can do things substantially differently in order to avoid side effects on PostgreSQL side.