Help me understand concurrent processing

stevensonmt · June 1, 2021, 10:46pm

I have the following code:

t1 =
:timer.tc(fn -> Task.async_stream(0..9, fn i -> i * i end)
|> Enum.to_list
:ok
end)


IO.puts("async_stream took:")
IO.inspect(t1)

skwerr = 
    fn i -> 
        caller = self()
        spawn fn -> send(caller, {:result, i * i}) end 
    end
get_result = 
    fn ->  
        receive do  
        {:result, result} -> result  
        end  
    end 
    
t2 = 
:timer.tc(fn -> 0..9
|> Enum.map(&skwerr.(&1))
|> Enum.map(fn _ -> get_result.() end)
:ok
end)


IO.puts(" spawn/recv took:")
IO.inspect(t2)

t3 = 
:timer.tc(fn -> 0..9
|> Enum.map(&(Kernel.*(&1,&1)))
:ok
end)



IO.puts("eager map took:")
IO.inspect(t3)

which returns:

async_stream took:
{11185, :ok}
spawn/recv took:
{81, :ok}
eager map took:
{3, :ok}

I don’t understand the huge discrepancy between Task.async_stream and calling spawn/1. Can anyone explain what is responsible for this huge difference? Of note, I have run it many times and that discrepancy remains though the eager map version is sometimes faster than the spawn version.

APB9785 · June 1, 2021, 11:17pm

Sorry I don’t have an answer to your question, but just wanted to let you know that :timer.now_diff/2 returns microseconds difference, not milliseconds.

stevensonmt · June 1, 2021, 11:19pm

ah, thanks. I actually realized that :timer.tc/1 is better than what I was doing for the timing part of this.

ityonemo · June 2, 2021, 12:30am

seems reasonable to me. Eager map is going to be VERY fast. Spawning has a bunch of overhead, but it’s still pretty fast. Note that your answers are not necessarily going to be in in-order.

Task.async has a bunch of overhead. By virtue of being in the Task module, it’s doing a lot of stuff you might not know about to make your experience sane (kind of like GenServer) Plus, because it’s doing async_stream, it’s preserving the order of the tasks, which incurs additional overhead. Finally, IIRC Task.async_stream won’t spin up more things than you have cores, so if you have less than 10 cores, it’s going to take at least two rounds of setting everything up and tearing everything down.

stevensonmt · June 2, 2021, 12:39am

Can you (or anyone else) be a little more explicit about the overhead costs that the Task carries with it?

I don’t think my results are skewed by the number of available cores. Even using a range of 0..0 yields the following:

async_stream took:
{20416, :ok}
spawn/recv took:
{10, :ok}
eager map took:
{1, :ok}

ityonemo · June 2, 2021, 12:50am

github.com

elixir-lang/elixir/blob/5368761edc66f54ffc82d564fceb8ad125817cfa/lib/elixir/lib/task.ex#L600




    iex> strings = ["long string", "longer string", "there are many of these"]
    iex> stream = Task.async_stream(strings, fn text -> text |> String.codepoints() |> Enum.count() end)
    iex> Enum.reduce(stream, 0, fn {:ok, num}, acc -> num + acc end)
    47


See `async_stream/5` for discussion, options, and more examples.
"""
@doc since: "1.4.0"
@spec async_stream(Enumerable.t(), (term -> term), keyword) :: Enumerable.t()
def async_stream(enumerable, fun, options \\ [])
    when is_function(fun, 1) and is_list(options) do
  build_stream(enumerable, fun, options)
end


defp build_stream(enumerable, fun, options) do
  &Task.Supervised.stream(enumerable, &1, &2, fun, options, fn [owner | _] = callers, mfa ->
    {:ok, pid} = Task.Supervised.start_link(get_owner(owner), callers, :nomonitor, mfa)
    {:ok, :link, pid}
  end)
end

RudManusachi · June 2, 2021, 5:26am

BTW, @stevensonmt, you might like “THE PROCESS - part 2 (Tasks)”[0] by @ityonemo. Where he shows some of those overhead and benefits of Task over plain spawn/receive

[0]

stevensonmt · June 2, 2021, 2:31pm

That video is brilliant. Thank you so much.
The gist I’ve taken from that is that for extremely simple stuff like my example the overhead introduced by Task is noticeable but in more realistic contexts it would be a) less noticeable and b) worth it for the added clarity and robustness of the code. So with respect to my original question, the difference between the two concurrent approaches is not due to concurrency but due to implementation details around the concurrency. Is that right?

ityonemo · June 3, 2021, 12:47pm

you don’t just want Task for the clarity and robustness, you want it because it gives you a bunch of things that you want “because distributed/concurrent systems are hard” (in the same way that you might want GenServer). If you go on to chapter 3, there’s deep dive on more things that Task gives you that are going to make writing disciplined, concurrent stuff sane (namely, tests). I think because Elixir (specifically, not even erlang) gives you this, you can build robust and well-tested programs that don’t treat concurrency as a fly-by-night operation, which IMO is how it feels in Go, for example.