There’s a couple of questions in here. For the first, Rust compiled to WebAssembly, that’s a fairly straightforward process.
Elixir running wasm modules is possible but there are a couple of things that make it not quite so smooth yet. The friction level is almost high enough that I considered writing my own Elixir (Gleam, actually) WebAssembly interpreter.
As far as I can tell, the currently most popular hex package for running/interpreting WebAssembly modules is wasmex. One thing that’s missing from this one (as far as I can tell) and the others I’ve seen for Elixir is support for imports - calling a host elixir function from inside the guest WebAssembly module.
I think it might be easier to provide better recommendations if I knew what exactly you were trying to do. If all you need to do is invoke functions on the module, and the module doesn’t need to import anything or support WASI, then you should be fine with wasmex.
p.s. The Rust+WebAssembly ecosystem has a large amount of activity going on right now, including my own project wasmCloud, which is an OTP-inspired actor framework where the actors are built in wasm.
Per your original comment about Rustler, it’s worth pointing out that wasmex uses Rustler under the hood, so it’s likely just providing a thin veneer around a Rust-based WebAssembly executor/interpreter.
Thank you for the informative reply. Taking a step back (and possibly presenting some irrelevant details), here is my situation:
I currently have a ~20K LOC Rust application that compiles and runs on both WebGL/wasm32 and OpenGL/x86_64. It is primarily intended to run in the browser, and as it turns out, WebGL rust code easily runs as OpenGL rust code.
I’m now splitting this application + adding concurrency/multi user, so the server portion is going to be elixir + Rust on multiple EC2 instances. Elixir will largely be functioning as a “router” while Rust does the “heavy lifting.”
So now, my options are:
elixir + rust run separately; talk over tcp / sockets [basically ruling this out]
elixir + rust/x86_64 via NIF / Rustler
elixir + rust/wasm32
So the core of the issue is:
I have some blob of rust code that compiles to both x86_64 and wasm. I want this code to talk to Elixir. Does it make more sense to do elixir + rust/x86_64 or elixir + rust/wasm32 ?
I have a slight preference for rust/wasm32 because I’m not 100% sure that my Rust code has no undefined behaviours / properly handled all unsafe’s, so in the case of my screwup, I would prefer it crashes just the wasm runtime, not the possibly crash the entire machine.
I think the deciding factor here revolves around aspects of WebAssembly that you may or may not need. First, wasm is portable - do you need the code you’re thinking of targeting at wasm to be swapped out at runtime or loaded at runtime like a plugin? If you only need something that’s statically linked, then wasm’s advantages might not be worth the difficulties.
Second is around the call pattern. What do you expect to be the frequency and size of calls into the wasm module and out of it? If you plan on doing a very “chatty” interface or one with particularly large payloads, or both, then crossing that wasm-host boundary is going to incur a latency penalty and if you’re doing graphics like generating/writing frames for a buffer, this could potentially impact your output.
If you wrap a GenServer around a NIF-held instance and the rust code only ever executes for the briefest periods of time, then you’ll get better latency than you do with wasm invocation (because you “can’t” do AOT compilation for Elixir+wasm). Wrapping a GenServer around this NIF also isolates the blast radius if the Rust code panics underneath (though hopefully Rust doesn’t panic all that often)
tl;dr if your usage of the current multi-targetable code is chatty and requires low latency, you’ll want to skip webassembly (especially with Elixir as the host language). If you’re going to make infrequent calls, or async calls that don’t require low latency, and you want your logic to be loadable at runtime versus statically linked, then wasm’s your huckleberry.
If I am understanding you correctly, you are claiming that the “wasm boundary” causes non-trivial latency. This goes against my intuition. I wonder if we are measuring different types of latency (perhaps you are thinking HFT and I am thinking gaming).
So the current model I have in mind is:
1. client is running Rust/wasm code in a browser
2. client, over commercial residential network, connects to EC2
3. EC2 machine runs elixir/rust_x86_64 or elixir/rust_wasm
4. server sends data back over commercial residential network
5. client (rust/wasm browser) gets data back
I believe that the choice of elixir/rust_x86_64 vs elixir/rust_wasm , in the worst case, is a few memcpy’s for every function call. This ‘latency’ seems to be dwarfed by the lag in 2 & 4.
To me right now, the main disadvantges of rust_wasm vs rust_x86_64 seem to be (1) rust_wasm limits 4 GB address space and (2) not sure how mature calling wasm from elixir is.
What do you think? I currently do not know enough about elixir/wasm to handle the unknown unknowns.
You’re right. What qualifies as non-trivial latency is in the eye of the beholder. If you needed to make 60 calls into the module per second (eg 60fps) that’s one thing.
If the client waiting for a reply to your wasm module is on the other side of an internet connection, this latency is negligible.
As for maturity, that’s basically up to us as a community. Give wasmex a try and see if it meets your needs.
The cost is more than a few memcopys for every function call. Since Elixir is interpreting the wasm and the wasm isn’t compiled, every time you call a wasm function, Elixir will start a read, decode, execute loop on the instructions in that function. So the cost is more like a multiple of the number of instructions in the wasm function.
But, the other statements on latency still hold true - if you’re making one wasm function call in response to a remote call over public internet, this latency is not the bottleneck.
I was under the impression that wasmex uses either wasmer or wasmtime under the hood – and that both were close to native x86_64 speed due to some JIT black magic. However, I have not verified this myself – how confident are you in your claims of ‘multiple of the number of instructions’ ?
I stand corrected then. I checked wasmex’s Cargo.toml file and it is using wasmer, which means it is indeed doing JIT compilation of the wasm instructions into native. Since that’s the case, you’re correct in that it’s just a couple of memory operations before and after the wasm call.
The claim of multiple of the number of instructions applies only to an “Elixir native” interpreter, where the Elixir code is reading each instruction out of the wasm bytes and doing the execution on demand.
Thanks for doing the research. I think we’re converging towards ‘truth’ of:
rust-wasm cons:
memcopy overhead (might be trivial outside of HFT)
JIT overhead
4GB per limit
rust-wasm pros:
easier hot swapping (as you stated)
crashes only wasm runtime
Given the inevitable memcopy overhead, I wander if it makes sense for something like:
rust/wasm on wasmer in a separate process
rust/wasm implements the Erlang “port” (or whatever distributed erlang uses to send terms)
elixir talks to rust/wasm just like any other distributed node
As an aside, I think the 4GB limit might be a much bigger problem. I don’t know if the typical server with 512 GB RAM also has 128 threads – because if not, there might be quite a bit of OS context switching of the wasm runtimes (unless a single process can host multiple wasm runtimes).
JIT overhead is essentially your “cold start” penalty. In my experience using wasmer, it’s typically less than a second for anything but the fattest of wasm files.
I’m not sure what you mean by the 4GB limit… but a single wasm module shouldn’t be maintaining much state at all, let alone 4GB (this is where I’d recommend state be managed outside the wasm module).
I would recommend that you put the wasmex executing code inside something like a Genserver so that it can queue up requests to it in single-threaded fashion (since the wasm module internally is single-threaded), and so that it can die without hurting your system. You can have literally millions of OTP processes without exceeding your OS thread limit.
Put another way, you have a single Elixir OTP application that can host millions of OTP processes, some of which can be wrappers around a wasmex instance (the module is basically the “state token” for that process). Your module-wrapping GenServer would then handle an incoming message by extracting parameters, invoking an exported function on the module, and replying accordingly.
I don’t know the right answer to the following problem because I have not solved it yet and I have not found any good articles on it yet either (but now I am starting to see why you brought up the ‘memcopy’ bottleneck earlier).
Suppose you are building a distributed sharded game server, the server side of something like Minecraft / Fortnite / Quake / …
Would you:
store ‘truth’ in Elixir, then, on every tick, have rust/wasm code grab the current world state, run one step of simulation, and write world data back out to elixir OR
store ‘truth’ in Rust, and Elixir merely serves as a ‘router’ routing user input to the rust/wasm code and routing state (user location, health, …) back to the client ?
In model 1, we’re going to have memcpy’s everywhere (far more than what I originally anticipated, but I now also understand your concern).
In model 2, it’s not hard for a single shard to hit 4GB very quickly.
Storing the truth, as some data structure, and then invoking the tick of a game loop in wasm means Elixir holds the memory, which means you control the sharing of it and have a higher limit.
If the truth is stateful inside Rust then it becomes stateful inside wasm as part of the compilation process, and now your wasm module could easily run afoul of the page limit of the host runtime.
When you go with option 1, you basically run into the following rule: you must be able to serialize your input, invoke your function, and de-serialize the output in less than your maximum frame elapsed time budget (the inverse of your frame rate). The good news is that a server-side frame rate can be less than a client-side because theoretically you’re not modeling ultra-low-latency things like particle fountains and projectiles. The bad news is you still have that per-frame budget you can’t exceed without causing lag.
One more thought on option 2: this is where people typically decide they need to employ sharding techniques. If the Rust/wasm module doesn’t take a copy of the entire world/universe, and is treated like a pure function and only acts on the subset of the world it needs, then you can run thousands of instances of that wasm module and allow OTP to do load balancing/distribution for you.
If you’re actually describing running a game loop at n FPS on the server-side in the cloud where the game logic is managed inside a WebAssembly module, then I feel like I should let you know that I’m working on a distributed ECS that uses my wasmCloud WebAssembly actor framework, where the actors you write are systems and then the host runtime takes care of components and entities. This is still mostly on paper, but I’ve built a prototype of it once before. Fun retrospective on my earlier prototype here.
This is really interesting. Obvious in retrospect, but I never considered that it is perfectly fine to have server side ‘frame rate’ != client side frame rate.
It is interesting that for your ‘radar problem’, you solved it via an algorithmic change rather than an O(1) ‘faster language’ change.
Hi, author of wasmex (the wasmer wrapper you discussed) here
I just stumbled over your discussion. I am not sure how I can best help, but if you have any specific questions, feel free to ask And, of course, if you find there are missing features please open a ticket on the repo.
I did not personally benchmark the speed of wasmex yet, but that is one of the things I really wish to do (or see someone else do).
For your data-passing-bottleneck problem: You could also store your world state in the wasm instances memory (readable from both, elixir and rust). This is still faster for rust to access, but I imagine that elixir-updates to an existing world view in rust-memory are smaller than passing the whole state in for every call.
This really depends on the actual data you want to pass around. Tooling for memory manipulation is far from being as good as with wasm-bindgen. But I’m very open to PRs in that direction.
In short, wasmex is what you describe (in an earlier post) as “elixir + rust/x86_64 via NIF / Rustler”.
Both of the frameworks are using WebAssembly to try and solve the problem of making it easy to build distributed applications. wasmCloud does this with an aim toward stripping away boilerplate, loosely coupling capability providers (non-functional requirements), and providing cryptographically secure modules so that you can control what the actors can and cannot access. From what I can tell of lunatic, it is more “OTP-like”, with explicit use of channel senders and receivers. wasmCloud is designed for extensibility and polyglot, supporting actors in TinyGo, Rust, and AssemblyScript and it looks like lunatic’s SDK is specific to Rust.
Would it be correct to say: stated another way, if you had full control over (1) choice of language and (2) choice of libraries your applications, then wasmCloud serves as a "docker replacement’ or sorts – instead of packaging an entire Linux image to throw on AWS EKS/ECS/Fargate, you can just build wasm binary (because you choose language + libraries that can target wasm), and you just provide a tiny wasm instead of a hunderds-of-MBs docker image ?
PS: For anyone reading this thread and unaware (like myself until a few minutes ago), @autodidaddict , as stated in his public elixir forum profile, is the author of the Rust/wasm book “Programming WebAssembly in Rust”.
No wonder you are pushing the boundaries of wasm.
Really appreciate all the time you took to help me work through this design space.