I need to update a bunch of records at once (as part of a single request) and I decided to try out using Repo.insert_all
with the on_conflict
option set to {:replace, fields}
because I know the specific list of fields I want to add to the existing records:
Repo.insert_all(MySchema, [%{id: 1, my_field: "test"}], on_conflict: {:replace, [:my_field]}, conflict_target: :id)
I wrote a simple test that asserts my_field on row with id 1 is now “test”, but it failed complaining about null value for inserted_at
. My understanding is the replace option I’m using should not update the timestamps, which are already set to non-null values. If I specify the timestamps, the test passes. Additionally, no other fields are being updated. I can see in the test they have the same values I used during the test setup that inserted the record with id 1.
Any suggestions what I could be doing wrong?
1 Like
Hard to guess without more details, such as schema, db etc.
Maybe you dont have any uniqueness constraint on id
? or there is no row to update.
I’m guessing that your insert_all query is trying to insert a new row (not updating). Note that insert_all
does not set timestamp fields such as inserted_at or updated_at) like the usual insert/update function.
You can enable logs and check the queries it’s making to db and confirm.
1 Like
I’m on postgres 12. id is an auto-incrementing primary key added by Ecto in a create table
migration.
I’m pretty sure the issue has to do with timestamps, which I don’t want to update. If I include them, the existing row updates as expected so I think the conflict setting is working. This is the SQL:
INSERT INTO "my_table" ("my_field","id") VALUES ($1,$2) ON CONFLICT ("id") DO UPDATE SET "tax" = EXCLUDED."tax","ship_from_zip" = EXCLUDED."ship_from_zip"... ["UPS", 37]
The timestamp columns seem omitted from the list of fields, and they are the only ones that are being erroneously updated.
From what I can tell this is a limitation of postgres. It applies constraints as if a new row were being inserted, so even if the existing row + update clause are valid, the latter query will still error if it doesn’t include all not null columns. Perhaps someone with more knowledge of postgres can confirm this and hopefully provide some insight as to why that is the case.
2 Likes
I tried this locally and got the same null value
error from postgres
Looking back at the spec
ON CONFLICT
can be used to specify an alternative action to raising a unique constraint or exclusion constraint violation error.
As per the spec ON CONFLICT
will be used when insert raise unique constraint
violation error, but in this case insert raises null value
error. Likely it checks this error before it checks for uniqueness (anyway I don’t think ordering matters).
In other words, “there is no conflict here because insert is invalid”
3 Likes
Yep this is how I understand as well, although the postgres docs could certainly be clearer on this point!
Summary for those searching for this error like I did:
- PostgreSQL requires all NOT NULL columns to be provided in an
INSERT
, even if the row already exists.
- If any NOT NULL column is missing, the INSERT is rejected before checking for conflicts.
ON CONFLICT DO NOTHING
does NOT bypass this
- Normally,
ON CONFLICT DO NOTHING
should ignore conflicts, but it still fails because PostgreSQL never reaches the conflict resolution step.
- Instead, the
INSERT
itself is invalid, so the query never proceeds to the ON CONFLICT
clause.
1 Like
I’m kinda baffled that there is no easier solution in Ecto for doing bulk updates like this, e.g. when we can’t derive the values from other values in the DB and update_all
won’t do.
I ended up here after hours of searching and thanks to this thread found a good enough workaround. I hope this helps anyone else ending up here:
Repo.insert_all(
MySchema,
[%{id: 42, a_field: "value", inserted_at: {:placeholder, :now}}],
on_conflict: {:replace, [:a_field]},
conflict_target: [:id],
placeholders: %{
now: NaiveDateTime.utc_now() |> NaiveDateTime.truncate(:second)
}
)
Note that we need to include a value for each non null field, even if they are not used!
The inserted_at
field is not actually updated, it just needs to be there in case it actually does an insert.
Also note that I used the placeholders
feature for this, such that we do not waste any bandwidth for fields that are ignored anyway.
If anyone knows of an even better solution, please let us know!