I’m not suggesting that you load all the chunks into memory. For example:
File.stream!("big_file.txt")
|> Stream.chunk_every(2000, 2000, [])
|> Task.async_stream(fn chunk ->
# compute some work on the chunk
end)
|> Stream.into("destination.txt")
This
- avoids pulling everything into memory
- still has parallel work
- does only one file read and one file write.
If you need complex aggregation behavior you’ll probably want Flow, which has features for grouping rows and reducing on them.