How to Handle NOT NULL Constraints in Bulk Upserts with Ecto?

Hey everyone,

I’m trying to perform a bulk upsert (insert_all with on_conflict) in Ecto for updating book ratings, but I ran into an issue where PostgreSQL enforces NOT NULL constraints before handling the conflict resolution.

Context

I have a books table with these relevant fields:

  • id (UUID, PK)
  • title (NOT NULL)
  • slug (NOT NULL)
  • ratings_dist (array)
  • ratings_count (integer)
  • average_rating (float)
  • inserted_at (NOT NULL, timestamps)
  • updated_at (NOT NULL, timestamps)

I’m updating only ratings_dist, ratings_count, and average_rating, and I’m generating updates similar to this dummy example:

updates = [
  %{
    id: "00000000-0000-0000-0000-000000000001",
    ratings_dist: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
    ratings_count: 1,
    average_rating: 6.5
  }
]

However, when using insert_all/3 like this:

Repo.insert_all(Book, updates,
  on_conflict: {:replace, [:ratings_dist, :ratings_count, :average_rating]},
  conflict_target: :id
)

It fails because PostgreSQL enforces NOT NULL constraints on title, slug, and inserted_at. Even though the existing row has values for those fields, Postgres treats the insert as a new row first, checks constraints, then applies ON CONFLICT logic.

Workarounds I’m Considering

  1. Use Dummy Values
  • Add "DUMMY_TITLE" and "DUMMY_SLUG" in insert_all, but exclude them from :replace, so the original values remain.
  • This avoids fetching existing data but feels like a hack.
  1. Use update_all Instead
  • Fetch existing books and loop through them with Repo.update_all/2.
  • Slower (N queries for N books), but avoids dummy data.

What’s the best practice in Ecto for bulk upserts when dealing with NOT NULL fields? Are there other approaches that avoid both unnecessary updates and constraint violations?

You can combine Repo.update_all with Ecto.Query.API.values, so you won’t need to fetch existing books.
There is an example with update in docs for Ecto.Query.API.values/2

6 Likes

It sounds like your use-case is actually updates, so the above answer is correct.

But if you were actually doing upserts - that is to say some of your rows are new, but you don’t know which - then you would be better off just setting the fields to "" rather than trying to split them into updates and inserts.

The nice thing about insert/on_conflict is that it’s actually atomic, whereas a read-then-write approach might not be outside of serializable.

Does this method have any limitations on the number of books which can be safely updated? I’m thinking around 20k for my usecase.

Inserting also has limits. I would chunk that 20k into groups of 2000 to 4000 records.

1 Like