Elixir NIF performance compared to native C++

I have a library which is written in C++, and I wrote a NIF “bridge” for it.

When I bench marked the library in C++, it had ~400-600 ips.

Then I bench marked it in Elixir, the result surprised me and it reached minimum 4k ips.

I couldn’t believe my eyes so I spinning a drogon http server to battle with Phoenix.

And the results from the two stay pretty much the same ~400 rps vs ~3k-6k rps.

How come a native C++ call is MUCH slower than the one from Elixir NIF?

1 Like

Are you able to share the code? It would be easier this way to say something sensible about it.
But from my own experience it’s quite difficult to measure between 2 platforms in a correct way.
(I’m using rustler myself and performance seems to be faster on the 100% rust side, I have never seen the elixir/NIF part being faster because of the (sometimes little) overhead crossing the NIF bridge)


Could be that your benchmarks are parallelized for BEAM but not when testing native C++?


Is it a normal or a dirty NIF? If it is dirty then is the workload IO or CPU bound and is the appropriate flag used in NIF declaration? If It is dirty then does Beam have enough dirty threads set in config (the defaults are rather low)

The problem is not that the BEAM were to “slow”, it is to “fast” compared with the pure C++ benchmark.

It gets about 5 to 10 times as meany IPS in the BEAM when called as NIF compared with the pure C++ benchmark.

As @rjk and @yurko said, it can depend on many things. If you could provide MCVE so we could check this out, then maybe we could give you more correct answer.

1 Like

I have made a simplified version of the code.

After considering parallel execution and some tweak to C++ compilation,

Parallel 1 in Benchee is around 2.3k IPS and Native C++ is about 2k or so, so they are similar now.

But I still feel Elixir NIF is faster.

Pure C++ is 174.96 ns per iteration while Elixir NIF is 423.47 μs, so pure C++ is 2420x faster than Elixir NIF.

What about the the IPS? Elixir has higher value.

20’000 iterations in 20 seconds mean less throughput than 2’000 iterations in 1 second. And from what I see Google Bench gives you amount of iterations, not IPS.