Elixir/Erlang is Faster than Optimized Rust(tokio) in Message Passing

Rustixir · December 8, 2021, 2:13pm

Hi everyone, im working on find best language/framework/system for
high concurrency, high performance and stable performance

after working 3 years in Erlang/Beam i decision to find better than erlang because
I read many where said: Erlang/Beam is slow !!

then I started to learn and working many real-world project with Rust,
then i find ( Tokio )
(from Tokio website: )

Tokio is an asynchronous runtime for the Rust programming language. It provides the building blocks needed for writing network applications. It gives the flexibility to target a wide range of systems, from large servers with dozens of cores to small embedded devices.

after working with it i thought its great because doing message passing with backpressure in channel like golang style

after writing a distributed realtime database in rust ,
I see a unstable performance even with ( heavy optimized ) was not great
I stopped it and done many research about more optimizing and then
writing a very low cost message broker with persistent for our back-end but again i saw unstable performance

then I switch to Elixir because (I heard about JIT) and from before I have experienced with it for a real-world project

I started prototyping 2 scenario message passing,
differenece was huge .

Beam was Winner in both Scenario: ( seq/concurrent client request to server)

==> Request call 1000
==> Request call 1000000

Scenario 1 Result ( Rust/Tokio : 65,798 ns __ Elixir: 15,385 ns )
Scenario 2 Result ( Rust/Tokio : 4,256,639 ns ___ Elixir: 299,920 ns )

and the end

after many research about distributed real-time database
after compare many case , find

Mnesia is best ever made Distributed real-time database and many developer
if elixir developers be smart, using it in near future for all OLTP real-time application

now our using Mnesia for read-heavy application
under heavy load,

its have many many great features we not found in other db
it is really complete

dimitarvp · December 9, 2021, 12:14am

I really like to believe that. But do you have repo(s) that demonstrate the difference? I would love to take a look!

I know Erlang has been getting pretty fast lately but I have my doubts; I’ve been going back to writing a Rust<->Elixir bridge again and in release mode plain async code (without orchestration except just proper yield points) executes in like 50 nanoseconds. But that obviously depends on what you call – I am talking about very plain Rust code in this case.

So, can you show us something more?

cmkarlsson · December 9, 2021, 12:54am

Here is a message passing thread from the erlang forum. Don’t know if it is related but at least in the ring benchmark rust comes out quite a bit faster.

Rustixir · December 9, 2021, 1:08am

Ok sure, tomorrow morning make a repo ,

Actually for Rust side of this test
I used a wrapper around tokio::sync:mpsc and tokio::sync::oneshot for response
this wrapper very simple
That is exist in my github

And I dont know but after done some benchmark
Myself and i saw even from some developer,
Rust Tonic ( gRPC ) a grpc framework around tokio , is slow !!!

compare with golang,
golang was faster than Rust

I said that for not strange !!!
In some benchmark rust-tokio performance
Is not stable

Rustixir · December 9, 2021, 1:10am

Thanks, i saw code of rust , in that use a channel for sending not await for response

Here we speaking about
Client / Server ( request / response )

cmkarlsson · December 9, 2021, 1:17am

I know not much about rust, I just saw a similar conversation on the erlang forum recently and thought it might add something to the conversation.

The initial rust benchmark does not wait for response, which makes the comparison not valid, however that code is corrected in subsequent posts and new results are posted.

The erlang thread is just about message passing. In other context the performance may of course differ.

Rustixir · December 9, 2021, 1:29am

Thanks again, for giving me a vision of more about the next benchmark.
I will write better benchmark tomorrow

Tomorrow morning everything will be clear

Rustixir · December 9, 2021, 10:07am

Hi Everyone.
I prototype another scenario
in that i not used any wrapper over rust + tokio !!
just used internal features

scenario 1.
in this scenario i used message passing and a calculate operation
first spawn some worker ( number == my cpu core)
then sending request to each worker and await for result of each worker and it do simple sum over N counter and then response Result::Ok in rust in elixir :ok

Rust Client


use chrono::PreciseTime;


mod server;
use server::server::{ start, MReq, MResp, Request};
use tokio::{sync::oneshot::{self, Receiver}};


#[tokio::main]
async fn main() {

    let server1 = start().await;
    let server2 = start().await;
    let server3 = start().await;

    let start = PreciseTime::now();
    // ===================================================

    for n in 0..1000 {
        
        let (recv_resp1, req1) = request_factory(n);
        let (recv_resp2, req2) = request_factory(n);
        let (recv_resp3, req3) = request_factory(n);
        
        
        let _ = server1.send(req1).await;
        let _ = recv_resp1.await;

        let _ = server2.send(req2).await;
        let _ = recv_resp2.await;
        
        let _ = server3.send(req3).await;
        let _ = recv_resp3.await;



    }

    // ===================================================
    let end = PreciseTime::now();
    let tm = start.to(end).num_microseconds().unwrap();
    println!("==> {} ns (microseconds)", tm) 

}


fn request_factory(n: i32) -> (Receiver<MResp>, Request<MReq, MResp>) {
    let (resp, recv) = oneshot::channel::<MResp>();
    let req = Request::<MReq, MResp> {
        msg: MReq::Event(n),
        resp
    };

    (recv, req)
}

Rust Server

pub mod server {

    use tokio::sync::mpsc::{self};
    use tokio::sync::oneshot::{self};

    pub struct Request<MReq, MResp> {
        pub msg : MReq,
        pub resp: oneshot::Sender<MResp>
    }

    pub enum MReq {
        Event(i32)
    }
    pub enum MResp {
        Event(Result<(), ()>)
    }



    pub async fn start() -> mpsc::Sender<Request<MReq, MResp>> {
        let (client, mut server) = 
            mpsc::channel::<Request<MReq, MResp>>(16);

        tokio::spawn(async move {
            while let Some(req) = server.recv().await {
                let MReq::Event(n)  = req.msg;
                {
                    let mut temp = 1;
                    for num in 1..n {
                        temp += num;
                    }
                }
                let _ = req.resp.send(MResp::Event(Ok(())));
            }
        });
        
        client
    }
    
}

Elixir Server

defmodule Todo.Server do
  use GenServer

  def start(args) do
    GenServer.start(__MODULE__, args)
  end

  def init(init_arg) do
    {:ok, init_arg}
  end

  # ========================================

  def handle_call({:event, n}, _from, state) do
    execute(n, 1)
    {:reply, :ok, state}
  end





  def execute(1, _) do
    :ok
  end
  def execute(num, temp) do
    execute(num-1, temp + num)
  end

end

Elixir Client

defmodule Todo.Client do

  def start(n) do
    {:ok, server1} = Todo.Server.start([])
    {:ok, server2} = Todo.Server.start([])
    {:ok, server3} = Todo.Server.start([])
    {tm, _} = :timer.tc(fn ->
        Todo.Client.loop({n, server1, server2, server3})
    end)

    IO.write("#{tm} ns")
  end


  def loop({0, _, _, _}) do
    :done
  end
  def loop({n, server1, server2, server3}) do
    GenServer.call(server1, {:event, n})
    GenServer.call(server2, {:event, n})
    GenServer.call(server3, {:event, n})
    loop({n-1, server1, server2, server3})
  end


end

above i said N sum operation N is iteration in below

Result is Amazing :

Scenario over 1,000 iteration
Rust+Tokio : 144,792 microseconds
Elixir/Beam : 23,393 microseconds

Scenario over 10,000 iteration
Rust+Tokio : 5,503,831 microseconds
Elixir/Beam : 812,285 microseconds

even with arithmetic operation beam was winner

Rustixir · December 9, 2021, 10:08am

What do you think ?

Rustixir · December 9, 2021, 11:55am

Yet another benchmark

to prove to myself and anyone because this is a very strange topic
but it is REAL

Scenario
in benchmark, spawn 3 very simple worker and those
generate (key, value) and
( send to channel in rust ),
( send to a mailbox in elixir )
then a task/process store those to (Hashmap in rust) (ETS in Elixir)

Again Beam Winner

i think this was a dream for joe armstrong but come to true …

Rust


use std::collections::{HashMap};
use chrono::PreciseTime;
use tokio::sync::mpsc;


#[tokio::main]
async fn main() {
    let mut kv = HashMap::<i32, String>::new();
    let (sender, mut recv) = 
        tokio::sync::mpsc::channel::<(i32, String)>(100);

    
    let start = PreciseTime::now();
    // ===================================================
    worker_factory(1000, sender.clone());
    worker_factory(1000, sender.clone());
    worker_factory(1000, sender);


    while let Some((key, val)) = recv.recv().await {
        kv.insert(key, val);
    }
    // ===================================================
    let end = PreciseTime::now();
    let tm = start.to(end).num_microseconds().unwrap();
    println!("==> {} ns (microseconds)", tm) 

}


fn worker_factory(counter: i32, chan: mpsc::Sender<(i32, String)>) {
    tokio::spawn(async move {
        for elem in 0..counter {
            let _ = chan.send((elem, elem.to_string())).await;
        }
    });
}

Elixir

defmodule Todo.Main do

  def start(counter) do
    tid = :ets.new(__MODULE__, [])

    :timer.tc(fn ->
      worker_factory(counter, self())
      worker_factory(counter, self())
      worker_factory(counter, self())
      receiver(0, tid)

    end)
  end

  def receiver(finished, tid) do
    receive do
      {key, val} ->
        :ets.insert(tid, {key, val})
        receiver(finished, tid)
      finish ->
        finished = finish + finished
        case finished do
          3 ->
            :done
          _ ->
            receiver(finished, tid)
        end
    end
  end

  def worker_factory(counter, kvserver) do
    Task.start(fn() ->
      Todo.Main.loop(counter, kvserver)
    end)
  end


  def loop(0, kvserver) do
    send(kvserver, 1)
  end
  def loop(n, kvserver) do
    send(kvserver, {n, "#{n}"})
    loop(n-1, kvserver)
  end


end

**Result was Amazing **

Scenario over 100 Iteration:
Rust + Tokio : 837ns ~ 1,300ns
Elixir/Beam : 270ns ~ 1,100ns

Scenario over 1,000 Iteration:
Rust + Tokio : 9,769ns ~ 14,300
Elixir/Beam : 2,202ns ~ 11,200

but this is not really real usage because in real world
for storing we almost always need read-heavy or write heavy
if take benchmark for it
i PROMISE beam is winner because it and (ETS) are very full features

Rustixir · December 9, 2021, 12:37pm

Im so sorry because benchmark in Rust was in ( compiler optimization level-1) after change to 3 , Rust 2x times faster in above scenario

Rust for raw performance is faster

but when involved message passing
Beam really is Beast

dimitarvp · December 9, 2021, 12:45pm

Here’s my Mac ~/.cargo/config:

[build]
# Use CPU family specific instructions for faster machine code.
rustflags=["-C", "target-cpu=native"]

[profile.dev]
split-debuginfo = "unpacked" # macOS-specific debug build acceleration.

[profile.release]
lto = true # Turn on link time optimization (more optimizations)
codegen-units = 1 # Reduce LTO units to 1 for maximum binary size reduction.
opt-level = 3 # Optimize for speed (not binary size).

# If you are on a M1 Mac
[target.x86_64-apple-darwin]
rustflags = [
    "-C", "link-arg=-undefined",
    "-C", "link-arg=dynamic_lookup",
]

This definitely makes the final linking phase much slower but I’ve had benchmarks in the past and the code is little bit faster indeed (and the binary size smaller).

dmarko484 · December 9, 2021, 12:46pm

It means Rust is 2x faster than before or than Elixir?

Rustixir · December 9, 2021, 12:49pm

2x times faster than Beam in raw processing,
but in message passing 2x times slower than Beam
actually i think beam have great performance already

Rustixir · December 9, 2021, 12:50pm

thanks

Rustixir · December 9, 2021, 12:55pm

but after changing and set opt-level = 3
again beam was faster in this scenario

scenario 1.
in this scenario i used message passing and a calculate operation
first spawn some worker ( number == my cpu core)
then sending request to each worker and await for result of each worker and it do simple sum over N counter and then response Result::Ok in rust in elixir

Rust+Tokio: 61,942 ~ 79,300 (microseconds)
Elixir/Beam: 23,393 microseconds

after set Rust compiler opt-level = 3

Rust + Tokio improved performance
from: 144,792 ns
to: 60,000 ~ 70,000 ns

I think Rust+Tokio is not very optimized like Beam already
even Tonic (Rust gRPC) is not great performance because
use channel internally, exist many benchmark.

zeroexcuses · December 10, 2021, 7:56am

Have you posted this to where Rust experts hang out? I.e. Rust forums / Tokio github issues ?

Rustixir · December 10, 2021, 10:04am

Yes but system send text :

Our automated spam filter, Akismet, has temporarily hidden your post in Benchmark (Elixir is faster than Rust+Tokio) when involved Message passing for review.

A staff member will review your post soon, and it should appear shortly.

We apologize for the inconvenience.

Rustixir · December 10, 2021, 10:17am

actually i think for optimization because Rust+Tokio said
make a oneshot channel (just send/recv one message) for each response

but beam use one mailbox for sending and recv,
use one channel/maibox for each process not one channel per sending/recv response and recently doing more optimization over mailbox fetching

Rustixir · December 10, 2021, 1:58pm

Answer to this question from rust community was :
switch to single-threading scheduler in Rust ,

when i did that, Rust was very Fast,
before it was: ~ 60,000 ns
after switch to single-thread: 2,000 ns,

and they said this overhead of message passing between thread,
not problem of tokio.

and the end:
when need many message passing in multithread system Beam is winner