Iterating over a file stream

benwilson512 · February 14, 2019, 2:49pm

I’m not suggesting that you load all the chunks into memory. For example:

File.stream!("big_file.txt")
|> Stream.chunk_every(2000, 2000, [])
|> Task.async_stream(fn chunk ->
  # compute some work on the chunk
end)
|> Stream.into("destination.txt")

This

avoids pulling everything into memory
still has parallel work
does only one file read and one file write.

If you need complex aggregation behavior you’ll probably want Flow, which has features for grouping rows and reducing on them.