Performing a find-or-create in Ecto?

I can’t seem to figure out a good way of doing this safely. I need to do a find-or-create, so I first check to see if a given row exists in the table, if it doesn’t, I try and insert it, and if a uniqueness constraint fails, I try fetching it again (meaning it was created in between my initial fetch and my insert attempt).

The options that come to mind are:

  1. Try and insert it anyway and just handle the unique constraint changeset error. This doesn’t work because the failed constraint on the insert aborts the entire surrounding transaction forcing a rollback.

  2. Insert with an on_conflict: :nothing. I haven’t found any way to tell by the returned struct from Ecto whether or not the insert actually happened in this situation, making me unable to know if I’m working with the actual inserted row or if I need to refetch. The docs suggest checking for id being nil, but as Ecto is autogenerating my ID in either case, id will never be nil.

  3. Insert with an on_conflict: [set: …] to force a dummy update. I have no safe column to set in a DO UPDATE, as any column I’m using that could conflict could be inadvertently changed in this situation.

Any ideas?

I find easier to do a create_or_find…

I’d love to, but for the reasons above, I haven’t found a working way of doing so.

If the create fails because of constraint errors, it means there is a record to find. Or am I missing something?

That’s right, but this is all happening as part of a bigger transaction, and as soon as the insert fails due to the constraint error, Ecto marks the entire transaction/connection as aborted and requires a rollback (which makes sense, as the underlying SQL engine would reject further commands anyway).

So I’m trying to figure out a way of doing a find-or-create without breaking the surrounding transaction, because the find-or-create is just one step out of many in a bigger Ecto.Multi and if the row was concurrently inserted by another process, I just want to refetch rather than end the entire transaction.

Ah, ok… I just wanted to remove the race condition (or move it between create and find). I was not thinking in terms of a bigger transaction.

I’m just playing around with code here, but maybe something like this? :sweat_smile:

@spec fetch_or_create_thing(keyword, map) :: {:ok, Thing.t} | {:error, Ecto.Changeset.t}
def fetch_or_create_thing(fetch_by, attrs) do
  with nil <- get_thing_by(fetch_by),
       {:ok, thing} <- create_thing(attrs) do
    {:ok, thing}
  else
    %Thing{} = thing ->
      {:ok, thing}

    {:error, %Ecto.Changeset{} = changeset} ->
      if changeset.errors[:my_unique_field] == {"has already been taken", []} do
        fetch_or_create_thing(fetch_by, attrs)
      else
        {:error, changeset}
      end
  end
end

(assuming these functions exist)

@spec get_thing_by(keyword) :: Thing.t | nil
def get_thing_by(by) do
  Repo.get_by(Thing, by)
end

@spec create_thing(map) :: {:ok, Thing.t} | {:error, Ecto.Changeset.t}
def create_thing(attrs) do
  %Thing{}
  |> Thing.changeset(attrs)
  |> Repo.insert()
end

Although, with this code, if the my_unique_field is not part of the fetch_by keyword list and the unique field already exists or is in the keyword but also in the attrs map but they aren’t the same value, then that could cause infinite recursion, so you might want to check for that and raise an error… ¯\(ツ)

If you’re using uuids you could generate the id yourself before doing the insert. Then you could use the :returning option to receive the id of the row that that was inserted/updated. In combination with on_conflict:none the id will remain unchanged for an insert or if an update occurred the existing id for the row.

Do you really need to know if the insert did happen? Can you insert or do nothing and afterwards simply query for the upserted item without any of the results of the insert operation?

3 Likes