Rust NIFs are epic! Shoutout to the Rustler folks. Version 0.22.2 is amazing!

I just wanted to post about my remarkable experience with Rustler today.

Unbelievable good.

Note: this example is an updated version of an article I found online. Main difference is being able to return binary data from Rust, and get working on a MacBook M1.

Short version to get up and running:

mix new base64
cd base64

edit mix.exs, add {:rustler, "~> 0.22.2"}

mix deps.get
mix rustler.new

In my case, Elixir module name was Base64, Rust crate name was base64_nif

  • note: I used base64_nif to avoid name collision with rust crate base64

Add base64 = "0.13.0" at the end of native/base64_nif/Cargo.toml

Edit native/base64_nif/src/lib.rs as follows:

use base64;
use rustler::Binary;
use rustler::OwnedBinary;

#[rustler::nif]
pub fn encode(binary: Binary) -> String {
    base64::encode(binary.as_slice())
}

#[rustler::nif]
pub fn decode(b64: Binary) -> OwnedBinary {
    let bytes = base64::decode(b64.as_slice()).expect("decode failed: invalid base64");
    let mut binary: OwnedBinary = OwnedBinary::new(bytes.len()).unwrap();
    binary.as_mut_slice().copy_from_slice(&bytes);

    return binary
}

rustler::init!("Elixir.Base64", [encode, decode]);

Note on the code above! It took me a while to figure out how to return a binary from Rust. This does not seem to be well documented.

Edit lib/base64.ex as follows:

defmodule Base64 do
  use Rustler, otp_app: :base64, crate: "base64_nif"

  @spec decode(String.t()) :: binary
  def decode(_base64), do: :erlang.nif_error(:nif_not_loaded)

  @spec encode(binary) :: binary
  def encode(_binary), do: :erlang.nif_error(:nif_not_loaded)
end

If you are on a Macbook M1

Add this to native/base64_nif/.cargo/config:

[target.aarch64-apple-darwin]
rustflags = [
    "-C", "link-arg=-undefined",
    "-C", "link-arg=dynamic_lookup",
]

(this is fixed in github, but not pushed to hex yet as of 0.22.2.

That’s it! Amazing!

Results for encoding small data:

iex(3)> file = "this is a really small string that we are going to encode to base 64"
"this is a really small string that we are going to encode to base 64"
iex(4)> Benchee.run(
...(4)>   %{
...(4)>     "elixir_base64" => fn -> Base.encode64(file) end,
...(4)>     "rust_base64" => fn -> Base64.encode(file) end
...(4)>   },
...(4)>   time: 2,
...(4)>   memory_time: 2
...(4)> )

...

Benchmarking elixir_base64...
Benchmarking rust_base64...

Name                    ips        average  deviation         median         99th %
rust_base64          7.75 M      129.10 ns  ±9810.57%           0 ns           0 ns
elixir_base64        1.70 M      587.28 ns  ±1714.81%           0 ns        2000 ns

Comparison: 
rust_base64          7.75 M
elixir_base64        1.70 M - 4.55x slower +458.18 ns

Memory usage statistics:

Name             Memory usage
rust_base64             560 B
elixir_base64           784 B - 1.40x memory usage +224 B

Rust is 4.55x faster

Results on larger binary data:

Benchmarking elixir_base64...
Benchmarking rust_base64...

Name                    ips        average  deviation         median         99th %
rust_base64          2.33 K        0.43 ms     ±3.23%        0.43 ms        0.47 ms
elixir_base64       0.100 K        9.97 ms     ±0.67%        9.95 ms       10.18 ms

Comparison: 
rust_base64          2.33 K
elixir_base64       0.100 K - 23.19x slower +9.54 ms

Memory usage statistics:

Name             Memory usage
rust_base64             560 B
elixir_base64           784 B - 1.40x memory usage +224 B

Rust is 23.19x faster

ps don’t forget to run with MIX_ENV=prod iex -S mix otherwise elixir will be 2x faster!

Wow. Thanks so much to the folks that worked on rustler!

It’s a game changer!

16 Likes
[target.'cfg(target_os = "macos")']
rustflags = [
    "-C", "link-arg=-undefined",
    "-C", "link-arg=dynamic_lookup",
]

should also work and make the library support both x86 and arm chips since these flags are required for all macs, not just arm ones.

Also note that, at least right now, most rust libs might skip enabling simd acceleration on arm64 macs and it needs to be enabled explicitly (if supported / used in the library), this can result in x2 or more improvements in performance. You’d need to use rust nightly for that.

1 Like

Would be interesting to see fast C-based NIF comparison too.

1 Like

Yeah, this is the exact code in rustler now. This works, too! :wink:

One callout on this - running a NIF on the regular scheduler for longer than 1ms is strongly warned against in the BEAM documentation.

1 Like

Something like this can be used for long-running nifs if using rustler:

#[rustler::nif(schedule = "DirtyCpu")]
pub fn encode(binary: Binary) -> String {
  // ...
2 Likes

Awesome. I updated my code! For some reason, I can’t edit my post above.

Just in case, #[rustler::nif(schedule = "DirtyCpu")] might be unnecessary for base64 encoding/decoding unless you need to support large binaries …

1 Like

Soner or later I need to learn rust, hopefully some 10 years with c should speed things up.
Thanks for the great writeup!

Have you tried Ziggler? Might be easier to get into for some people than rust? Similar end goals

Yeah, I did create Niffler some time ago as a proof of concept and was curious as well. Niffler using tinyCC as a JIT compiler to compile C code at runtime into a NIF (in memory). It can’t do optimizing like gcc or clang but well this is for fun:

base64.exs

defmodule Base64 do
  use Niffler

  def encode(bin) do
    {:ok, [ret]} = encode_nif(bin)
    ret
  end

  defnif :encode_nif, [input: :binary], ret: :binary do
    """
    static char encoding_table[] = {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
    'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P',
    'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
    'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f',
    'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
    'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
    'w', 'x', 'y', 'z', '0', '1', '2', '3',
    '4', '5', '6', '7', '8', '9', '+', '/'};
    static int mod_table[] = {0, 2, 1};

    size_t output_length = 4 * (($input.size + 2) / 3);
    $ret.data = $alloc(output_length);
    $ret.size = output_length;
    size_t input_length = $input.size;
    unsigned char* data = $input.data;
    char* encoded_data = $ret.data;
    for (int i = 0; i < $input.size;) {
      uint32_t octet_a = i < input_length ? data[i++] : 0;
      uint32_t octet_b = i < input_length ? data[i++] : 0;
      uint32_t octet_c = i < input_length ? data[i++] : 0;

      uint32_t triple = (octet_a << 0x10) + (octet_b << 0x08) + octet_c;

      *encoded_data++ = encoding_table[(triple >> 3 * 6) & 0x3F];
      *encoded_data++ = encoding_table[(triple >> 2 * 6) & 0x3F];
      *encoded_data++ = encoding_table[(triple >> 1 * 6) & 0x3F];
      *encoded_data++ = encoding_table[(triple >> 0 * 6) & 0x3F];
    }

    for (int i = 0; i < mod_table[$input.size % 3]; i++)
      $ret.data[output_length - 1 - i] = '=';
    """
  end
end

# file = "this is a really small string that we are going to encode to base 64"
file = :rand.bytes(300_000)

result = Base64.encode(file)
^result = Base.encode64(file)

Benchee.run(
  %{
    "elixir_base64" => fn -> Base.encode64(file) end,
    "niffler_base64" => fn -> Base64.encode(file) end
  }
)

And the results from mix run base64.exs:

Benchmarking elixir_base64...
Benchmarking niffler_base64...

Name                     ips        average  deviation         median         99th %
niffler_base64        879.64        1.14 ms    ±10.09%        1.11 ms        1.54 ms
elixir_base64         115.28        8.67 ms     ±9.63%        8.45 ms       13.46 ms

Comparison: 
niffler_base64        879.64
elixir_base64         115.28 - 7.63x slower +7.54 ms

So rust wins this one, a “real” C nif compiled with GCC or clang would for sure perform better, but for that, someone would have to write all the boilerplate…

And indeed awesome work on Rustler!

Cheers

Hi everyone! Dropping in since I see this question is related to a topic I experimented but never really dig into seriously: have anyone tried the difference between the “pure rust” implementation and the NIFfed boilerplate?

For the sake of learning Rust I’ve created a NIF doing some cryptographic work on small binary data (128/256 bits) that is usually done in native erlang in a work-related project. While the Rust library itself benches in the order of few microseconds per iteration, I’ve experiences a big slowdown when integrating it into a NIF, to the extent where the NIF is “only” one third faster than the erlang code.

This may be related to the serialization/deserialization (no serde involved tho’) or to the fact that the bitstrings are very little, as we see as a difference in the example above.

Being “protected” work I’ve never took the time to put up an equivalent PoC to show the problem, but maybe someone else has encountered it and has some hints for passing data between Rust and the BEAM in a performant way!

Thanks rustler folks for allowing me to use two of my favorite languages!