Spawning 1Milion Task - why is it slow?

I wrote this simple program:

1..1_000_000
|> Enum.map(fn x -> spawn(fn -> x + 1 end) end)
|> Enum.count()

This takes ~20sec to run on my AMD TreadRipper (16core + HT).
Why is so slow?

How much time would you expect it to take? And against what are you measuring? That is, if that’s slow, what’s fast?

On my old laptop (4 cores) it takes 26207224 us (~26us per process), btw.

iex(3)> :timer.tc fn -> Enum.map(1..1_000_000, fn x -> spawn(fn -> x + 1 end) end) end
{26207224, [...]}

That’s less than 20 microseconds per spawn. Feels ok to me.

1 Like

It is strange since on my 2013 Macbook Pro it takes 20 sec and it only has 4 cores and is running a ton of other things including a copy of our prod env. in docker

The number of cores does not matter - after all it’s just one process spawning all of those other processes. This benchmark is entirely sequential.

I suspect the load of the spawned processes themselves is insignificant compared to the load of actually spawning the process.

6 Likes

For me your code took 27.079564 seconds (one check).

With flow library after small change:

1..1_000_000
|> Flow.from_enumerable()
|> Flow.map(& &1 + 1)
|> Flow.run()

It took 0.27585316 seconds (avg from 100 checks) which is more than 98.16658979001728 times faster - it’s almost 100 times faster!

6 Likes

very good point :slight_smile:

1 Like

Yes, and by default process is started on the same scheduler as the spawner.

2 Likes

I don’t really know what you guys are doing. I run the following code:

-module(s).
-export([n/1]).
n(0) -> ok;
n(N) ->
    spawn(fun()-> ok end),
    n(N-1).

and get:

~$ erl -P 1200000
Erlang/OTP 19 [erts-8.3] [source-d5c06c6] [64-bit] [smp:8:8] [async-threads:10] [kernel-poll:false]

Eshell V8.3  (abort with ^G)
1> timer:tc(s,n,[1000000]).
{5347543,ok}

which is about 5 usec per spawn. Some of those processes are also dying in that time.

2 Likes

The original code example was counting the number of results, which you’re not doing. I wonder if that explains the difference?

nope, @michalmuskala already described why it’s so slow (comparing to @rvirding and my examples)

The major difference is between running the code from shell and running compiled code. One should never benchmark in the shell. Running from shell takes about 20s for me and running from a compiled module about 4s, which seems consistent with your results.

@rvirding example is also entirely sequential, it’s just running compiled code and not interpreted as shell does.

6 Likes

Here’s @rvirding 's example equivalent in Elixir:

iex(49)> defmodule S, do: (def n(0), do: :ok; def n(a), do: (spawn(fn -> :ok end); n(a - 1)))
iex(50)> :timer.tc(S, :n, [1000000])
{4447952, :ok}

4.4 microseconds per spawn. Not too shabby :slight_smile:

3 Likes

I have called @gregvaughn code fully in console 100 times and average of all results is: 1.11810393 second. Comparing to my previous result of original code (i.e. 27.079564 seconds) it’s still 24.21918327395558 much faster.

As you said I have also tried to compile module code from this same example and checked it again 100 times in iex shell. The average result is: 1.15066154 second which is even slower than everything used in iex.

Finally I have called @rvirding code (only 5 checks manually, because I don’t use Erlang so often). The result is 1.1473615 second which is a bit faster than other results.

So it’s not like that compilation of module code reduces time so much or maybe better summary: not in this case.

To be fair, my example does compile the module, and the bytecode is stored in memory. On the other hand, if you’ve compiled it as an external beam file, then you have the IO of reading that file into memory as an extra step.

2 Likes