Consequences of using ordered:true in async_stream when piping to Enum?

So here’s something I’ve been puzzling over. We have the result of Task.Supervisor.async_stream_nolink(Sup, [], &foo/1) piping into Enum.flat_map. I think my specific question would pertain to any use of the async_stream family.

I understand if I pass ordered: true to async_stream_nolink the results will feed into Enum.flat_map in order, and if I pass ordered: false they will feed in in the order the spawned processes complete.

Behind the scenes, what is happening to these processes spawned by async_stream when ordered is true or false? My assumption was that, regardless of what I pass for the ordered option, all tasks will need to complete before Enum.flat_map begins doing work. So in either case, things will need to buffer.

Is that true? In this specific scenario, is there any difference in terms of computation, memory usage, or process lifetime when I pass ordered: true or false?

We previously were using ordered: false, but recently have a need for ordered: true, and we’re trying to figure out if there are any consequences related to computation speed or memory.

I’ve walked through the Elixir codebase and docs a fair amount trying to find an answer but didn’t manage to piece it together myself.

I also know this is kind of a hard question to express, so I can certainly try to clarify or rephrase :slight_smile:

That part is not true. While the output of Enum functions cannot be lazy the input can be any Enumerable, which includes lazy sources. Enum.flat_map is the only reason why the stream is realized in the first place.

The difference between ordered false and true is therefore if the stream emits results immediately when done or buffers any out of order results.

Excellent, this helps a lot @LostKobrakai, I had a feeling some of my assumptions were wrong.

So Enum.flat_map could begin to process results when we use ordered: false even before the other processes spawned by async_stream complete, is that correct?

If each of those spawned processes was using a lot of memory (for example, they’re producing a very large map) I’m trying to figure out if ordered: true would somehow change things regarding memory usage.

Is it possible, since we are we are processing results unbuffered with ordered: false, that one value emitted from the async stream could be processed and that memory freed earlier than if ordered: true was used and buffering occurred?

My hunch here is that ordered: true or ordered: false is not going to practically change overall memory consumption in our case, but I’m still not sure. Clearly I’m very new to understanding how this works behind the scenes!

Yes, but keep in mind that max_concurrency is still the ruler here at the end of the day. If you set max_concurrency: 4 then maybe the first item is slow, and so it has to buffer 3 items until the first item finishes so it can emit it first, but it doesn’t have to buffer infinite items, just max_concurrency - 1 worst case.

1 Like

Yes. When Enum.flap_map asks for the value of the first stream item then :max_concurrency number of processes are started. With ordered: false the first value to return is supplied to Enum.flap_map. With ordered: true any value done earlier as the first would be buffered until all prev. ones were given to Enum.flap_map.

2 Likes

Great point, I was not thinking about that. So our max concurrency value is definitely serving as a limit as to how many processes could ever be “hanging out” waiting for other processes to finish.

Using max_concurrency = 4 for example purposes:

Am I understanding then that, with ordered: true, we might see at worst 3 processes buffered before all 4 processes complete, but once 4 processes finish flat_map can “get to work” on those results while the next batch of max_concurrency processes begin?

In a sense does flat_map then receive 4 items at a time until the enum async_stream is operating on has been fully processed? While with ordered: false they’ll just come in a stream as they complete?

This upper bound definitely changes things, I’m so glad you pointed that out.

In both case they come one at a time, that’s always how consuming Enumerables works: one at a time. BUT if you were to record how much time Enum.flat_map had to wait between emitted items, you’d see that with ordered: true and a slow first item there would be a delay, then you’d have 4 in a row emitted very quickly one after another, and then things would go from there.

With ordered: false you’d see sort of random intervals between when items were emitted.

Yes! That’s what I meant but I didn’t phrase it well, I think “4 items at a time” was poor wording on my part. More like… in a contrived example you could see an Enum function processing things and chronologically they might come in “bursts” of four, though it’s still consuming one item at a time. That seems to match what you’re describing as well.

Alright, I think I have the big picture in place.

Thank you both! This was insanely helpful.

1 Like