Are streams faster, or just more memory efficient?

I’m processing some large-ish datasets by streaming CSV input, and I was curious if this actually contributed to speed or if it was merely keeping the memory footprint down.

To find out, I ran some simple benchmarking:

def process_stream(length) do
    1..length
    |> Stream.map(&:math.sqrt(&1))
    |> Stream.map(&:math.pow(&1, 2))
    |> Stream.run()
  end

  def process_enum(length) do
    1..length
    |> Enum.map(&:math.sqrt(&1))
    |> Enum.map(&:math.pow(&1, 2))
  end

Benchee says the enum mapping is actually a bit faster:

Comparison: 
enum          1.01 K
stream        0.76 K - 1.33x slower +0.33 ms

Does this seem correct? Is it generally true that streams - accumulating a bunch of lambdas on each input and then executing them all at once, rather than making multiple passes through the input list - aren’t any faster?

3 Likes

It is literally apples and oranges at your comparison points.

Streams are memory, process, and CPU cycles efficient ways to handle very large data, continuous or unknown length data.

Doesn’t make sense to performance benchmark them against enumerations, you will get some stream overhead.

The easiest way to wrap your head around is try opening a 20MB text file with any IDE that loads the file. Very likely it will crash or take ages to load.

tail or head the same file or use an IDE that supports streams - nano, vim - to open the same file. IT will open instantly, thanks to the streaming capabilities. The Stream (data) keeps flowing and implemantation manages the necessary operations a bucket at a time.

Imagine displaying latest 100 visitors for your website with the visit count from your logs with 100_000_000_000 lines

With enums

logs | load_data | order time descending | take first 100

You will either run out of memory or process will be killed by the OS or freeze or take a long time.

with streams realising you need the only first 100 of reversed list so it will start optimisations and provide you with the result in a reasonable time.

3 Likes

Good catch, always thought Stream MUCH faster than Enum. Made some additional research and have got:

  def process_stream(length) do
    1..length
    |> Stream.map(&:math.sqrt(&1))
    |> Stream.map(&:math.pow(&1, 2))
    |> Stream.map(&:math.sqrt(&1))
    |> Stream.map(&:math.pow(&1, 2))
    |> Stream.map(&:math.sqrt(&1))
    |> Stream.map(&:math.pow(&1, 2))
    |> Stream.run()
  end

  def process_enum(length) do
    1..length
    |> Enum.map(&:math.sqrt(&1))
    |> Enum.map(&:math.pow(&1, 2))
    |> Enum.map(&:math.sqrt(&1))
    |> Enum.map(&:math.pow(&1, 2))
    |> Enum.map(&:math.sqrt(&1))
    |> Enum.map(&:math.pow(&1, 2))
  end
length = 1000000

Only on that scale Stream becomes faster than Enum

Comparison: 
process_stream          1.74
process_enum            1.38 - 1.27x slower +152.30 ms

With two operations in a row(like in the first post) and with length equal 10000, i’ve got the same results(Stream is 1.33 slower), but with length equal 10, it’s almost twice slower

process_enum        439.61 K
process_stream      233.28 K - 1.88x slower +2.01 μs
3 Likes

A better test might be to read and process your CSV using tools in Stream and then do it again without streaming (presumably you’re using functions in Enum).

Whether a tool will provide better performance depends on the sorts of problems you’re solving, and will probably be way more interesting than trying to benchmark working on a range of numbers! Definitely try it on some small, medium, and massive files and compare notes.

4 Likes

To me streams were always about being more memory efficient and protecting the app against huge inputs that could bring down a node or a container.

Only from a certain scale and onwards do streams become a little bit faster as well but that’s very dependent on the task to be performed, as you yourself have discovered.

2 Likes

José Valim on this on this topic during this year’s Advent of Code: by streaming you’re trading CPU for memory. And yes, stream will be slower, depending on the input size.

2 Likes