Surprising behavior of File.stream vs File.read

brightball · March 9, 2021, 5:56pm

Just out of curiosity, we tried making the file a lot bigger (about 200mb). The results are the same. The memory usage on the Stream implementation is 2x higher than just using File.read and Enum…which doesn’t seem like it should be the case. At this point I feel like I have to be doing something wrong to end up with these results.

Here are the Benchee results for the bigger file though.

Name                                            ips        average  deviation         median         99th %
File.stream! with Stream.run                 0.0647        15.45 s     ±0.49%        15.43 s        15.56 s
File.stream! with Stream |> Enum.into        0.0593        16.87 s     ±0.58%        16.89 s        16.97 s
File.read! with Enum                         0.0506        19.76 s     ±4.07%        19.54 s        20.96 s
Comparison: 
File.stream! with Stream.run                 0.0647
File.stream! with Stream |> Enum.into        0.0593 - 1.09x slower +1.42 s
File.read! with Enum                         0.0506 - 1.28x slower +4.31 s
Memory usage statistics:
Name                                     Memory usage
File.stream! with Stream.run                  5.35 GB
File.stream! with Stream |> Enum.into         6.58 GB - 1.23x memory usage +1.23 GB
File.read! with Enum                          3.04 GB - 0.57x memory usage -2.30341 GB

Before I posted this I decided to run it again and watch htop while the two were running. My laptop has 32gb of RAM and when the File.read version is running varied between 16-24% memory usage while the test was running. When Stream was running it never exceeded 0.9%. This was reflected in the overall memory usage on the system as well.

So after all of this I think the issue may be either Benchee itself or my Benchee configuration, because it doesn’t seem to be properly tracking the memory usage. Anybody else ever run into that?