Spawning 1Milion Task - why is it slow?

glmeocci · May 23, 2018, 8:09pm

I wrote this simple program:

1..1_000_000
|> Enum.map(fn x -> spawn(fn -> x + 1 end) end)
|> Enum.count()

This takes ~20sec to run on my AMD TreadRipper (16core + HT).
Why is so slow?

idi527 · May 23, 2018, 8:13pm

How much time would you expect it to take? And against what are you measuring? That is, if that’s slow, what’s fast?

On my old laptop (4 cores) it takes 26207224 us (~26us per process), btw.

iex(3)> :timer.tc fn -> Enum.map(1..1_000_000, fn x -> spawn(fn -> x + 1 end) end) end
{26207224, [...]}

NobbZ · May 23, 2018, 8:15pm

That’s less than 20 microseconds per spawn. Feels ok to me.

andre1sk · May 23, 2018, 8:23pm

It is strange since on my 2013 Macbook Pro it takes 20 sec and it only has 4 cores and is running a ton of other things including a copy of our prod env. in docker

michalmuskala · May 23, 2018, 8:41pm

The number of cores does not matter - after all it’s just one process spawning all of those other processes. This benchmark is entirely sequential.

I suspect the load of the spawned processes themselves is insignificant compared to the load of actually spawning the process.

Eiji · May 23, 2018, 8:50pm

For me your code took 27.079564 seconds (one check).

With flow library after small change:

1..1_000_000
|> Flow.from_enumerable()
|> Flow.map(& &1 + 1)
|> Flow.run()

It took 0.27585316 seconds (avg from 100 checks) which is more than 98.16658979001728 times faster - it’s almost 100 times faster!

andre1sk · May 23, 2018, 8:51pm

very good point

rvirding · May 23, 2018, 9:30pm

Yes, and by default process is started on the same scheduler as the spawner.

rvirding · May 23, 2018, 9:35pm

I don’t really know what you guys are doing. I run the following code:

-module(s).
-export([n/1]).
n(0) -> ok;
n(N) ->
    spawn(fun()-> ok end),
    n(N-1).

and get:

~$ erl -P 1200000
Erlang/OTP 19 [erts-8.3] [source-d5c06c6] [64-bit] [smp:8:8] [async-threads:10] [kernel-poll:false]

Eshell V8.3  (abort with ^G)
1> timer:tc(s,n,[1000000]).
{5347543,ok}

which is about 5 usec per spawn. Some of those processes are also dying in that time.

gregvaughn · May 23, 2018, 9:50pm

The original code example was counting the number of results, which you’re not doing. I wonder if that explains the difference?

Eiji · May 23, 2018, 9:58pm

nope, @michalmuskala already described why it’s so slow (comparing to @rvirding and my examples)

michalmuskala · May 23, 2018, 10:02pm

The major difference is between running the code from shell and running compiled code. One should never benchmark in the shell. Running from shell takes about 20s for me and running from a compiled module about 4s, which seems consistent with your results.

@rvirding example is also entirely sequential, it’s just running compiled code and not interpreted as shell does.

gregvaughn · May 23, 2018, 10:03pm

Here’s @rvirding 's example equivalent in Elixir:

iex(49)> defmodule S, do: (def n(0), do: :ok; def n(a), do: (spawn(fn -> :ok end); n(a - 1)))
iex(50)> :timer.tc(S, :n, [1000000])
{4447952, :ok}

4.4 microseconds per spawn. Not too shabby

Eiji · May 23, 2018, 10:32pm

I have called @gregvaughn code fully in console 100 times and average of all results is: 1.11810393 second. Comparing to my previous result of original code (i.e. 27.079564 seconds) it’s still 24.21918327395558 much faster.

As you said I have also tried to compile module code from this same example and checked it again 100 times in iex shell. The average result is: 1.15066154 second which is even slower than everything used in iex.

Finally I have called @rvirding code (only 5 checks manually, because I don’t use Erlang so often). The result is 1.1473615 second which is a bit faster than other results.

So it’s not like that compilation of module code reduces time so much or maybe better summary: not in this case.

gregvaughn · May 23, 2018, 10:38pm

To be fair, my example does compile the module, and the bytecode is stored in memory. On the other hand, if you’ve compiled it as an external beam file, then you have the IO of reading that file into memory as an extra step.