Avoiding duplicate values in Repo.insert_all/3

A call to Ecto’s insert_all/3 like this

Repo.insert_all(Person, [
  %{name: "Alice", location: "USA"},
  %{name: "Bob", location: "USA"},
  %{name: "Charles", location: "USA"}
])

generates SQL that looks like the following:

INSERT INTO "people" ("name", "location") VALUES ($1, $2), ($3, $4), ($5, $6)

with ["Alice", "USA", "Bob", "USA", "Charles", "USA"] passed in. I noticed that USA is passed three separate times even though it has the same value for multiple rows. This could be avoided by writing the query something like this:

INSERT INTO "people" ("name", "location") VALUES ($1, $4), ($2, $4), ($3, $4)

with ["Alice", "Bob", "Charles", "USA"] passed in. Obviously it’s not a huge issue when a three character string is duplicated three times, but for a larger value during something like a bulk insert, this could end up being a lot of data transferred over the wire and processed that doesn’t need to be. I could fix this by hand writing the SQL, but that’s not the most elegant solution. Is there a way to do this in Ecto currently? If not, could it be implemented? It probably wouldn’t be very performant to add a uniqueness filter to everything passed in, but maybe it could be an option, or a different form of the function.

2 Likes

It is not currently supported. We could support some sort of dictionary, where you could pass placeholders and the values for said placeholders. For example:

Repo.insert_all(Person, [
  %{name: "Alice", location: {:placeholder, 1}},
  %{name: "Bob", location:  {:placeholder, 1}},
  %{name: "Charles", location:  {:placeholder, 1}}
], placeholders: %{1 => "USA"})

But yeah, not supported right now.

4 Likes

I have created a PR for this on at elixir-ecto/ecto-sql#290 and elixir-ecto/ecto#3515.

4 Likes