I suspect the biggest issue you are hitting is that you are interleaving processing and I/O. That causes you to thrash the scheduler. Things work a lot better if you can read in the data, process it, then write it out all in one chunk.
With 12GB of data, if you don’t have enough RAM to hold everything, then streams are still useful, but you want to have bigger chunks. i.e. use the stream to read a block of data from the disk, split it into a number of records, chunk them, then process the chunks in parallel, then write each chunk to disk.
You can parallelize the processing of the entries to take advantage of multiple cores. I have found
https://github.com/beatrichartz/parallel_stream easy to use and fast, though there are other things that are part of the standard library. It lets you batch on the number of workers and number of records to process per worker, e.g.
workers = :erlang.system_info(:schedulers) * 2
stream = ParallelStream.map(records, &(process_record(&1)), num_workers: workers, worker_work_ratio: 1000)
results = Enum.into(stream, [])
Instead of concatenating strings, you can generate iolists, e.g. fn args -> [args, "\n"] end
See https://www.bignerdranch.com/blog/elixir-and-io-lists-part-1-building-output-efficiently/
We have one high-volume application which has configuration info in JSON, about 1M records with 1KB of JSON for each record. The data starts in a Postgres database. We have one job that reads all the data in the database, parses the JSON, massages it, then writes out a CSV file with key and JSON data. On startup, the app parses the CSV and loads the data into an ETS table.
The export job was originally taking 30 minutes. By processing the data in parallel and paying attention to I/O, it now takes about two minutes. Similar optimization on the load job took it from about three minutes down to about 8 seconds.
Elixir is not as fast as C, but it is reasonably efficient. The ability to easily parallelize work and take advantage of all the cores often makes up for absolute processing speed. Binary pattern matching works at about half the speed of C, and https://github.com/plataformatec/nimble_parsec makes it easy to implement efficient text parsers. For things which are driven by I/O and concurrency, it is very competitive.