Recursion performance: why so slow?

robinmonjo · October 24, 2017, 4:40pm

Hello all,

I wanted to implement a simple function that calculates the “hash rate” of my computer. Basically, how many times per second my computer can compute a sha256 hash. So I came up with this simple function:

defmodule HashRate do
  def estimate_hash_rate(interval \\ 20) do
     start = System.system_time(:second)
     n = estimate_hash_rate(interval, start, 0)
     n / (interval * 1000) # result in khs 
  end
  def estimate_hash_rate(interval, start, n \\ 0) do
    if start + interval < System.system_time(:second) do
      n
    else
      :crypto.hash(:sha256, "data#{n}")
      estimate_hash_rate(interval, start, n + 1)
    end
  end
end

It runs during 20 seconds and count how many times it performs the hash. When I run this function multiple times, I get results like 378.59365, 337.1613

I implemented a similar algorithm in Go and got much faster results : 5236.6924, 5468.282. Here is the code:

func main() {
	interval := int32(20)
	start := int32(time.Now().Unix())
	n := 0
	for {
		if start + interval < int32(time.Now().Unix()) {
			break
		}
		h := sha256.New()
		h.Write([]byte("data" + strconv.Itoa(n)))
		n += 1
	}

	fmt.Println(float32(n) / float32(interval * 1000))
}

I know Elixir/Erlang may be slow for computation, so I commented out the sha256 hash of my code (to just count the number of iteration per seconds) and got these kinds of results:

Go: 90172.62 (k iterations / s)
Elixir: 5476.6122 (k iterations / s)

Again the difference is HUGE. I don’t think I’m doing anything wrong in my tests. I’m aware of “tail recursivity” (and I believe the function I wrote is tail recursive). I tried to run the code with MIX_ENV=prod but got same kind of results.

Any ideas of what is going on ? What makes Elixir so slow compared to Go in this naive test ? Maybe System.system_time(:second) ? Or maybe I’m doing something wrong ??

OvermindDL1 · October 24, 2017, 4:49pm

A couple things:

If TCO takes place it will indeed prevent the stack from growing, but it does this by destroying everything currently on the stack (basically re-using it), which is a ‘tiny’ bit more expensive then just growing the stack (trivially so in most cases).

The BEAM VM does a ‘reduction’ on every-single-function call, so it can pre-empt an actor when it has used up ‘too many reductions’ to allow others to run.

The BEAM VM is also running a very safe, even if you do absolute horror of code in an actor it will not crash other actors, so there is a lot of state keeping. it is designed to be an ‘Always Running’ System.

The BEAM VM is optimized for IO events, not function calls, and as such it focuses on those in an entirely safe way.

Compare to go where yes it is faster, however a single badly written line of code in some library that you may use that may only get called randomly in production can crash your entire system, this cannot happen on the BEAM VM (and if it could then it is a bug and should be reported).

The BEAM VM is fantastic at slaving out expensive things to calls in native code (even go) while keeping everything running and even restarting native code that can crash.

In general programming on the BEAM VM (whether via Elixir, Erlang, etc…) is similar to python where you first implement the base system on the BEAM VM, trusting it to never die, and if you find something that is worth speeding up then slave that out to a native process via a Port or so, fully managed by the BEAM VM by supervisors just like any other process to do the hard and heavy work.

And an aside, in Go you are probably not calling a function recursively there, it is probably getting optimized in machine code to simple jumps, so you are not testing apples<->apples anyway.

NobbZ · October 24, 2017, 4:55pm

I have not benchmarked bymyself, nor did I properly profile those codes, but fetching the systems time is expensive. You are basically measuring this…

Instead you should do a fixed number of iterations and measure the time inbetween.

And instead of trying to measure those times and iterations by yourself you should use tools like go test -bench and Benchee to get properly benchmarked values.

Also I have to say, that you do not finish the calculation of the hash in the go version, to make it roughly equivalent to your elixir version you need to call Sum() on it:

	for {
		if start + interval < int32(time.Now().Unix()) {
			break
		}
		h := sha256.New()
		h.Write([]byte("data" + strconv.Itoa(n))).Sum()
		n += 1
	}

OvermindDL1 · October 24, 2017, 5:03pm

In addition, your System.system_time call is not cheap as it is causing the VM to call down in ways that Go’s is not doing (needs to get a synchronized time).

OvermindDL1 · October 24, 2017, 5:32pm

Hmm, also I’m getting very different results from you?

╰─➤  echo -n "\nElixir: "; elixir -e 'IO.puts(Tester.test())'; echo -n "\nGo: "; ./test_go; echo -n "\nOCaml: "; ./test_ml; echo

Elixir: 6335.006

Go: 12091.954

OCaml: 28248.52685

╰─➤  cat test.ex
defmodule Tester do
  @interval 20
  def test() do
    start = System.system_time()
    test(start + (@interval * 1000000000), 0)
  end
  defp test(en, n) do
    if System.system_time() >= en do
      n / (@interval * 1000)
    else
      test(en, n+1)
    end
  end
end

╰─➤  cat test.go
package main

import (
        "time"
        "fmt"
)

func main() {
        interval := int32(20)
        start := int32(time.Now().Unix())
        n := 0
        for {
                if start + interval < int32(time.Now().Unix()) {
                        break
                }
                n += 1
        }

        fmt.Println(float32(n) / float32(interval * 1000))
}

╰─➤  cat test.ml
let rec looper en n =
  if Unix.time () >= en
  then n
  else looper en (n + 1)

let _ =
  let count = looper (Unix.time () +. 20.0) 0 in
  print_float ((float_of_int count) /. (20.0 *. 1000.0))

The go version does not time as accurately either (only second resolution, which could skew it’s results a bit against it I think, but not enough to matter much here). But really, why is go looking so slow compared to another immutable functional language like OCaml? Is there some way to compile it in a ‘release’ mode or something? o.O

(/me would choose OCaml over Go for non-multithreaded work any day, and would choose MC-OCaml over Go for multi-threaded work any day, Go is a poor language, especially in its verbosity)

OvermindDL1 · October 24, 2017, 5:43pm

Also because I was curious, how about Rust: ^.^

╰─➤  echo -n "\nElixir: "; elixir -e 'IO.puts(Tester.test())'; echo -n "\nGo: "; ./test_go; echo -n "\nOCaml: "; ./test_ml; echo -n "\n\nRust: "; ./test_rs; echo

Elixir: 6381.8058

Go: 12281.936

OCaml: 28238.0583

Rust: 18588

╰─➤  cat test.rs
use std::time::{Duration, SystemTime};

fn main() {
  let interval = Duration::from_secs(20);
  let start = SystemTime::now();
  let mut n = 0;
  while start.elapsed().unwrap() < interval {
    n += 1;
  }
  println!("{}", n / 20000);
}

And to take away from this just know that this is not telling how fast function calls are, but rather time getting functions are slow regardless of the language. I.E. use a real benchmarking tool, don’t count seconds because the only thing you are doing is benchmarking your time getting functions.

But hey, OCaml has the fastest time acquiring function! ^.^;

OvermindDL1 · October 24, 2017, 6:04pm

And I got a free moment while waiting for something, so hey, a C++ version using the high_resolution_clock (generally the slowest yay)!

Here is just C++'s since I don’t want to wait for the rest to run:

─➤  ./test_cpp 
7628.14
╰─➤  cat test.cpp 
#include <iostream>
#include <chrono>

int main() {
  using namespace std::chrono;
  int interval = 20;
  high_resolution_clock::time_point end = high_resolution_clock::now() + seconds(20);
  int n = 0;
  while((end - high_resolution_clock::now()).count() > 0.0) n += 1;
  std::cout << (n / 20000.0) << std::endl;
  return 0;
}

Ooo lookie here, C++ is almost as slow as Elixir’s! The BEAM VM uses a very high resolution timer too.

OvermindDL1 · October 24, 2017, 6:10pm

Also this is fun, output binary file sizes:

╰─➤  ls -lah test_*
-rwxrwxr-x. 1 <user> <user> 8.9K Oct 24 12:02 test_cpp
-rwxrwxr-x. 1 <user> <user> 1.9M Oct 24 11:22 test_go
-rwxrwxr-x. 1 <user> <user>  99K Oct 24 11:25 test_ml
-rwxrwxr-x. 1 <user> <user> 3.9M Oct 24 11:39 test_rs

C++'s is utterly tiny at 8.9K (which is funny because I’m using iostreams and other such large inefficient things).
OCaml’s is quite decent at a respectable 99K.
Go is massive at 1.9M (unsure why? Probably include it’s stdlib like Rust, I did compile it optimized).
Rust is of course massive at 3.9M because it includes it’s stdlib, including pthreads and its high performance memory allocator, blehg, though it could be shrunk a bit via messing with optimization levels and LTO)

/me loves OCaml, such an amazing language and compiler both

robinmonjo · October 24, 2017, 6:35pm

Wahou !! Thank you @OvermindDL1 !! Amazing answer. I know my comparison was kind of stupid (not real benchmark + based on time etc …) but that was just a quick check and I got really surprised by the results. Don’t know about Rust but for Go the binary is fat because it includes the entire runtime (run your binary on another machine that doesn’t know anything about Go, it will run , if same OS and arch).

OvermindDL1 · October 24, 2017, 6:38pm

Heh no problem. But yeah most people do not know just how slow timing functions really are and they just end up benchmarking their timing functions instead of the code they really intended to. ^.^

You can usually get around it by, say, calling a function in a loop an X number of times and time how long ‘that’ took to run (if <1s then run it again but with more loops to get it over 1s at the least, this is what Elixir’s Benchee benchmark library does for example, and most others).

That is because it builds standalone binaries, as does Rust, OCaml, and C++, hence why I only showed their sizes. ^.^