How to do get_or_insert_by in Ecto without race condition?

Hi there. I am trying to implement the logic that does get_by(..) and if not exists, it should insert and return.

The first implementation is like this. (We are using Postgres.)

def get_or_insert(account_id, type) do
  case Repo.get_by(Storage, account_id: account_id, type: type) do
    nil -> Repo.insert(Storage, %{account_id: account_id, type: type})
    storage -> {:ok, storage}
  end
end

The storage has the composite unique index for account_id and type. So it works fine until there are 2 same requests. I expected it to return same storage for both req, but one of them will get error for type already exists because of race condition that the Repo.get_by from 2 req were evaluated at the same time before any insertion.

As I am familiar with Rails, I would solve this in Rails by doing account.lock do ... end to prevent reading the record at the same time. But not sure how to implement this in Ecto. Any idea to implement this idea while avoiding race condition?


So far, this one is the most similar for what I am looking for, but still answer only the case of get(id)

Yes! Either by pessimistic locking (which you can do by wrapping you code in a Repo.transaction, assuming that you use Postgres or another database that supports them), which locks the whole table,
or by optimistic locking, which requires an extra field in your DB table, but is much faster in cases where conflicting writes are uncommon, since it only ‘locks’ one record in the table.

There also is Ecto.Query.lock but it is more low-level and DB-specific.

1 Like

Do transaction in postgres actually lock the whole tables? I think it might depend on isolation levels, which is read committed, by default (I think), which I don’t think locks the tables but uses mvcc.

Depending on the problem, you can also use upserts. If you do insert(data, on_conflict: :nothing, returning: true), it will do nothing if the data exists and return all fields from the database.

We also talk about it on the What’s New in Ecto 2.0 ebook (95% of it still applies to Ecto 3). http://pages.plataformatec.com.br/ebook-whats-new-in-ecto-2-0

8 Likes

No. Doing an insert does not prevent other transactions doing inserts. You have to use an explicit lock command (in Postgres) to lock the whole table. Doing an update will block other updates on those rows, and may block reads depending on the isolation level of the reader (but not by default in Ecto).

2 Likes

You are correct; it depends on the isolation level, which, as long as it isn’t SERIALIZE will not lock the whole table.

2 Likes

This is not correct. Isolation level will not ever cause a table to be locked by an insert nor other DML command. Locks and isolation level are separate concepts. According to the Postgres documentation, these are the only commands that lock a full table: ALTER TABLE, DROP TABLE, TRUNCATE, REINDEX, CLUSTER, and VACUUM.

DML commands like insert/update obtain ROW EXCLUSIVE locks.

Isolation level is concerned with isolation between transactions, and preventing dirty reads/phantom reads. SERIALIZABLE simply means the transactions must have the same semantics when running in parallel, that they would have running one after the other. Yes this requires read locks, and not reading new rows from another transaction but that does not lock the whole table against inserts.

7 Likes

@jeremyjh Thank you for clarifying :slight_smile:!

2 Likes

To make sure I understand this correctly (because TFM seems ambiguous on this - actually more than the above quote):

When there’s conflict (like a UNIQUE constraint being violated), does the insert/2 function return the conflicting record from the database, discarding the struct or changeset passed to it? What if multiple records conflicted with the passed data? Like when violating constraints on several fields?

Update:

Did some tests (using Postgres), and it seems that given a changeset, inserting which would violate unique constraint, Repo.insert(changeset, returning: true, on_conflict: :nothing) returns the data passed to it rather than the record from the database. Is this the expected behaviour? If yes then what’d be the way to obtain the existing record without performing a separate query for this purpose?

2 Likes

As clarified in a separate thread this is not actually true. If the function only translates syntax between Elixir and SQL then only the equivalent SQL behaviour remains valid. PostgreSQL documentation says:

The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted (or updated, if an ON CONFLICT DO UPDATE clause was used)

This is consistent with results of my quick tests above: on_conflict: :nothing prevents returning fields from the database as no rows are “actually inserted (or updated)”. Shall see if I can improve this section of the docs :wink:

2 Likes