Flow - Parallel hard disk read

nfplay · May 16, 2022, 9:35am

Hi everyone! Reading from Flow documentation around the examples that deal with processing data from streams, it’s not entirely clear to me how data is actually read from the source using Stream. From my understanding, reading from a Stream is not done in parallel, what is really done in parallel is the actual processing layer. So Flow keeps reading lines sequentially and distributing to several genstages for processing.

Given how documented parallel reads are specially in SDD disks, I wonder it there’s an implementation that specifically deals with this part. I gather that optimizations in this part would be monumental given that multithreading around SDD reads normally doubles throughput for every additional simultaneous thread working.

Thanks in advance!

Nuno