Ecto alternative of find_in_batches?

Hi,

What’s the simplest way of going over an extremely large set of records in a database? Is there something like Rails’ find_in_batches in Ecto?

4 Likes

Ecto.Repo.stream/2. You can see in the docs it takes a :max_rows of 500, which is the same as the batch size in find_in_batches.

8 Likes
result = Repo.transaction(fn() ->
  your_ecto_query_here
  |> Repo.stream
  |> Enum.each(fn(one_record) ->
    # do something with one record here.
  end)
end)

case result do
  {:ok, success} -> # ...
  {:error, reason} -> # ...
end
6 Likes

For those of you who want the “batch” part of find_in_batches for whatever reason, use Stream.chunk_every to coerce the Repo batches into Lists:

Repo.transaction(fn() ->
  your_ecto_query_here
  |> Repo.stream(max_rows: 100)
  |> Stream.chunk_every(100)
  |> Enum.each(fn batch ->
    # do something with one *batch* here.
  end)
end)
5 Likes

There’s a key difference here in that for MySQL and Postgres that you have to be in a transaction. I don’t think the rails equivalent find_in_batches requires this.

I am going over a large table so ideally I would control what parts of the iteration is within a transaction… is there an alternative?

Then you need to pagination on your own, as there is no safe way to do it without some additional constraints, and as Ecto do not know whether these constraints are uphold (and where are implemented) for your data, it do not provide such functionality.