How exactly an Ecto stream works?

almir-neto · May 14, 2025, 2:17pm

I saw that Ecto repo have the function Repo.stream, but I dont know what the steps happens when a trigger the Stream.run. A query like SELECT * FROM table will waits the full scan of database to proced to the next steps or as data arrives will it be sended on to the Elixir client?

Suppose this code, what are happen between the client and the database?

select_all_query
|> Repo.stream()
|> Stream.map(&some_operation_on_data)
|> Stream.run()

al2o3cr · May 14, 2025, 2:41pm

Repo.stream uses a cursor on databases where that’s available, so it will retrieve chunks of max_rows rows at a time.

almir-neto · May 14, 2025, 2:45pm

You know if to a Redshift database the Repo.stream can avoid brings the entire table at once?

I am using a postgres adapter to connect to Redshift.

cevado · May 14, 2025, 2:57pm

As @al2o3cr said, it will retrieve chunks, according to the executed query. if there is no filtering/limiting factor on your Stream processing side of the thing, it will end up with all rows in memory at the endo f the stream execution.
functions like Stream.run/1, Stream.take/2, Stream.filter/2 and Stream.reject/2 should limit or avoid having all rows returned by the query in memory on your application.

edit: otherwise, it will protect the database of returning all rows at once with the cursor but will have all rows in the application memory.

al2o3cr · May 14, 2025, 4:49pm

The Redshift docs include a specific warning about using cursors with a large result set:

AFAIK this is a Redshift-specific issue, as the corresponding section of the PG docs doesn’t mention anything similar.