RandNif: an experiment to speed up random number generator with Rust

gyson · February 24, 2019, 4:31am

I noticed that :rand module (standard library) is a little bit slow when the algorithm I was using is heavily relying on random number generator. I checked the source code and realized that it’s written in erlang instead of native code (e.g. BIFs).

Then I come up with a question:

if :rand module is written in native code, how much performance it could gain ?

Therefore, I wrote a NIF in Rust (via rustler) for :rand.uniform and made a benchmark comparison between :rand module and newly created RandNif module. The benchmark result is as following:

Compiling NIF crate :randnif (native/randnif)...
    Finished release [optimized] target(s) in 0.06s
Operating System: macOS
CPU Information: Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.8.1
Erlang 21.2.4

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 5 s
parallel: 1
inputs: none specified
Estimated total run time: 1.13 min


Benchmarking :rand.uniform/0...
Benchmarking :rand.uniform/1...
Benchmarking RandNif.uniform/0...
Benchmarking RandNif.uniform/1...

Name                        ips        average  deviation         median         99th %
RandNif.uniform/0      145.35 K        6.88 μs   ±192.62%        5.96 μs       14.30 μs
RandNif.uniform/1       71.77 K       13.93 μs    ±38.32%       12.55 μs       22.28 μs
:rand.uniform/0         44.99 K       22.23 μs    ±42.47%       20.53 μs       42.35 μs
:rand.uniform/1         37.81 K       26.45 μs    ±34.67%       24.31 μs       43.62 μs

Comparison:
RandNif.uniform/0      145.35 K
RandNif.uniform/1       71.77 K - 2.03x slower
:rand.uniform/0         44.99 K - 3.23x slower
:rand.uniform/1         37.81 K - 3.84x slower

Memory usage statistics:

Name                 Memory usage
RandNif.uniform/0         3.11 KB
RandNif.uniform/1         1.56 KB - 0.50x memory usage
:rand.uniform/0          11.97 KB - 3.85x memory usage
:rand.uniform/1          10.41 KB - 3.35x memory usage

**All measurements for memory usage were the same**

As we can see, :rand.uniform/0 is 3.23x slower than RandNIf.uniform/0.

(I am not very confident with this part. Please correct me if I am wrong.) But NIF is not free, it has some cost as well I believe. Therefore, I made another simple experiment here and measured that the cost of NIF call is about 67% of RandNif.uniform/0 by comparing a noop NIF function with RandNif.uniform/0, which may implies that it would be 3x faster if it’s BIF instead of NIF.

Therefore, if :rand.uniform/0 is written in native code (BIF), with my non-scientific estimation, it would be 9.69x faster (3.23 x 3).

Then I had a brief look with source code of erlang’s :rand module and Rust’s rand module (the one used in NIF) and I believe native code would have following advantages:

Each random number generator needs to initialize a seed and save it somewhere for next usage. For erlang’s :rand module, it’s saved in process dictionary. But for native code, it could be saved at thread level, which means faster access/update, smaller memory usage and better cache hit. Native code would have more advantage when we have a lots of erlang processes.
Native code is faster for it’s computation

Given that :rand module is fairly common used module, maybe, in the future, one day, BEAM core team would consider to create BIFs for :rand module ?

All code / benchmark / scripts are available at https://github.com/gyson/rand_nif.

Thanks for reading

OvermindDL1 · February 25, 2019, 11:30pm

Could be quite an interesting conversation to bring up on the erlang tracker.

BIF’s/NIF’s I don’t think have ‘that’ large of a runtime call difference, their difference comes in garbage handling and >1reduction execution handling, in which case NIF’s are a touch slower due to less internal interactions, so it shouldn’t matter much for this case I’d think.

To reduce the ‘call’ time you can always return a list of values as well, if the user needs 1000 random values then have a function that lets you return the N count of random values all in one call, although if you go this route you probably want to add in reduction handling if the N is too large (or for a NIF go to the CPU processing thread, which I think is automatic in OTP21 now or so?).