Iterating over a file stream

I’m not suggesting that you load all the chunks into memory. For example:

File.stream!("big_file.txt")
|> Stream.chunk_every(2000, 2000, [])
|> Task.async_stream(fn chunk ->
  # compute some work on the chunk
end)
|> Stream.into("destination.txt")

This

  1. avoids pulling everything into memory
  2. still has parallel work
  3. does only one file read and one file write.

If you need complex aggregation behavior you’ll probably want Flow, which has features for grouping rows and reducing on them.

6 Likes