Stream data from PostgreSQL table using Repo.stream()

Hello!

I wanted some clarification regarding Repo.stream(). Consider the code fragment below:

Repo.transaction(fn ->
  Blog
  |> where(title: ^title)
  |> Repo.stream()
  |> Stream.run()
end)

Let’s imagine running this query takes a lot of time (It’s a big DB!) In this time, is it correct that:

  1. Another DB connection will be able to select, update and delete rows in this table?
  2. Any changes made to the table after the beginning of this transaction will not affect the results of this query?

I could not really find any substantive answer online. Thanks!

Hi, streaming doesn’t make a difference with transaction isolation and locking behavior, but yes readers don’t block writers and your query results will be a consistent snapshot from the beginning of the query. This is just standard Postgres behavior, you can refer to their docs.

https://www.postgresql.org/docs/current/static/transaction-iso.html

SELECT statements take an “ACCESS SHARED” lock. The only transactions that acquire locks exclusive of “ACCESS SHARED” are ALTER TABLE, DROP TABLE a few other less common commands.

https://www.postgresql.org/docs/current/static/explicit-locking.html

3 Likes

I also believe that Postgres.stream/4 will use a cursor underneath (which makes sense). So:

  1. There will be at least 2 extra DB round trips - one to declare/open the cursor and one to close it. Probably not a big overhead on a large DB
  2. I believe the cursor is opened for READ (not READ and UPDATE) so in Postgres that means MVCC would result in you having consistent results and not messing with other users
1 Like

@jeremyjh and @kip, thanks for the detailed responses! They help a lot to understand what’s happening on the Ecto and PostgreSQL sides.

P.S. @jeremyjh Thanks for the awesome work on dialyxir, it has helped catch many bugs :slight_smile: