I saw that Ecto repo have the function Repo.stream
, but I dont know what the steps happens when a trigger the Stream.run
. A query like SELECT * FROM table
will waits the full scan of database to proced to the next steps or as data arrives will it be sended on to the Elixir client?
Suppose this code, what are happen between the client and the database?
select_all_query
|> Repo.stream()
|> Stream.map(&some_operation_on_data)
|> Stream.run()
Repo.stream
uses a cursor on databases where that’s available, so it will retrieve chunks of max_rows
rows at a time.
1 Like
You know if to a Redshift database the Repo.stream
can avoid brings the entire table at once?
I am using a postgres adapter to connect to Redshift.
As @al2o3cr said, it will retrieve chunks, according to the executed query. if there is no filtering/limiting factor on your Stream processing side of the thing, it will end up with all rows in memory at the endo f the stream execution.
functions like Stream.run/1
, Stream.take/2
, Stream.filter/2
and Stream.reject/2
should limit or avoid having all rows returned by the query in memory on your application.
edit: otherwise, it will protect the database of returning all rows at once with the cursor but will have all rows in the application memory.
2 Likes
The Redshift docs include a specific warning about using cursors with a large result set:
AFAIK this is a Redshift-specific issue, as the corresponding section of the PG docs doesn’t mention anything similar.
2 Likes