Nx, EXLA and Torchx 0.12 have just been published!
This update set comes with the new Nx.block abstraction and the EXLA.CustomCall protocol, which together will allow users to provide native C or CUDA implementations for certain sections of Nx code.
EMLX 0.3 (the official MLX backend for Nx) is also out, with:
Support for the more recent concurrency model provided by MLX
EMLX.Fast for a few fused operations (used in the upcoming emlx_axon library, pending Axon and Bumblebee releases in the next few days)
By using it (see this benchmark for an example), we can accelerate certain Bumblebee models without rewriting code for performance when using EMLX! For example, the Qwen3-0.6B-4bit model goes from ~10 tokens/s to ~42 tokens/s on my machine just by using the provided Axon node rewrites!
@manhvu , @GenericJam was able to run EMLX on device, which means all of Nx can run! We’re working to upstream what he needs so that no patches are needed, just config!
EMLX for iOS because it can use Core ML. I’m not sure about Vulkan yet; it doesn’t seem stable enough. I’m still looking for a way to use the mobile GPU on both iOS and Android for on-device training for edge AI.
Nice, you’ve done that I looked for the best solution, but it sounds like I need to separate them. Currently, only WebGPU can run on both platforms, so I’ll try to adapt Nx to it.
Currently, I’m trying to bring CubeCL in as a backend for Nx. These days, Rust is quite good for ML/AI and big data, so I want to work with a combination of Elixir and Rust.