joaquinalcerro
What is the best approach for fetching large amount of records from postgresql with ecto
Hi there,
I am working with Ecto-Postgresql and I need to call all of the records from a specific table but the table has 40,000 records approximately (15MB).
Each record in the table has a email message id with which I have to call Google Gmail API to get the message.
So I am thinking about memory efficient. What is the most efficient way to call all the records with the minimum impact in memory or not loading the hole 15MB at once just to make API calls?
I found Ecto Stream but it works inside a transaction and I don’t use transactions. And… is Ecto Stream the best way to go?
Thanks for any comments.
Regards.
Most Liked
josevalim
IMO stream is the way to go. You need to run inside a transaction but that should not be a problem. You should be able to do neat things such as:
Repo.stream(...)
|> Task.async_stream(&deliver/1, max_concurrency: 10)
|> Stream.run
and that will get emails in batches of 10 and invoke the google email api concurrently. It requires Elixir v1.4.
michalmuskala
You can increase the timeout for the transaction where streaming happens:
Repo.transaction(fn -> ... end, timeout: some_huge_number)
You need to remember that LIMIT + OFFSET pagination is prone to race conditions - it may happen that you’ll miss some rows or see rows that you shouldn’t. Database-level cursor gives a consistent view of data.
brightball
Detailed explanation there: We need tool support for keyset pagination
Short version, when iterating over the records get the max range and the min range to make sure it’s used in your where clause. This trims the result set BEFORE sorting. Using LIMIT and OFFSET means the entire possible result has to be loaded and sorted to calculate which chunk of them is coming back.








