Just out of curiosity, we tried making the file a lot bigger (about 200mb). The results are the same. The memory usage on the Stream implementation is 2x higher than just using File.read
and Enum
…which doesn’t seem like it should be the case. At this point I feel like I have to be doing something wrong to end up with these results.
Here are the Benchee results for the bigger file though.
Name ips average deviation median 99th %
File.stream! with Stream.run 0.0647 15.45 s ±0.49% 15.43 s 15.56 s
File.stream! with Stream |> Enum.into 0.0593 16.87 s ±0.58% 16.89 s 16.97 s
File.read! with Enum 0.0506 19.76 s ±4.07% 19.54 s 20.96 s
Comparison:
File.stream! with Stream.run 0.0647
File.stream! with Stream |> Enum.into 0.0593 - 1.09x slower +1.42 s
File.read! with Enum 0.0506 - 1.28x slower +4.31 s
Memory usage statistics:
Name Memory usage
File.stream! with Stream.run 5.35 GB
File.stream! with Stream |> Enum.into 6.58 GB - 1.23x memory usage +1.23 GB
File.read! with Enum 3.04 GB - 0.57x memory usage -2.30341 GB
Before I posted this I decided to run it again and watch htop while the two were running. My laptop has 32gb of RAM and when the File.read version is running varied between 16-24% memory usage while the test was running. When Stream was running it never exceeded 0.9%. This was reflected in the overall memory usage on the system as well.
So after all of this I think the issue may be either Benchee itself or my Benchee configuration, because it doesn’t seem to be properly tracking the memory usage. Anybody else ever run into that?