Emily - A new MLX-based backend for Nx

Emily is an Elixir library that runs Nx computations on Apple’s MLX. Install it as the default Nx backend and Nx, defn, Axon, Nx.Serving, and Bumblebee all dispatch to the Metal GPU on Apple Silicon with no further integration work.

Highlights:

  • Nx.Backend with transparent fallback for uncovered ops
  • Nx.Defn.Compiler that threads Bumblebee and Nx.Serving through MLX
  • Fused transformer kernels (rms_norm, layer_norm, rope, scaled-dot-product attention), optionally applied to any Bumblebee model via a shim
  • Affine group-wise int2 / int4 / int8 quantisation
  • bf16 mixed-precision training
  • Per-process Metal command queues for concurrent inference on a shared model
  • Runnable Livebooks for DistilBERT QA, Qwen3 quantized generation, MNIST training, Whisper transcription,
    and the fused kernels

MLX is vendored in (for now) because the recent command queue threading-related features have not yet been released. The first compile will compile MLX, which can be a bit slow (> 1 min).

macOS / Apple Silicon. Requires Xcode with the Metal toolchain;

19 Likes

Amazing work, that’s such a great contribution and I can’t wait to give it a spin.

3 Likes

How does this compare to GitHub - elixir-nx/emlx: MLX Backend for Nx · GitHub?

3 Likes

Roughly:

  • emily exposes the concurrency model now available in mlx, I assume emlx can’t do this because of the version it uses.
  • emily has features I don’t believe emlx advertises. Fused kernels, quantisation, mixed precision training (might be wrong about this one), maybe also zero-copy to_binary/1.
  • emily vendors mlx at a specific commit (for reasons outlined above and in more details in the docs), emlx uses a relatively old binary
  • There’s quite a few conformance tests in emily that use real open-source models, but I mostly did that for my own sanity
5 Likes

Can’t wait to get my hands on Apple M hardware to integrate this with my eXMC.

Also Emily allows you to control whether MLX’s kernels are compiled AOT or JIT. The latter significantly reduces the size of the final application, at some small trade-off for start-up time. The CI tests, including conformance, run in both modes.

2 Likes

Nice! Does this mean that there’s a backend for Nx that now Just Works™️ in Livebooks run on OSX that is actually GPU accellerated?

Yes, there a few livebooks in the docs that demonstrate usage - it’s really just setting the default backend in the Mix.install config e.g:

2 Likes

We’re open to bringing in any features to EMLX you feel are missing (and also you as a maintainer) if you wanna join efforts!

4 Likes

I will say it would have been nice to acknowledge EMLX in the README as a source of inspiration, since the PLAN.md committed to main references it quite a bit.

1 Like

Yes of course, thx for pointing this out. I’ll address this today.

I released 0.3.2 of Emily.

The majority of the changes are in the packaging. If you’re consuming Emily as a hex package, the NIF is downloaded from Emily’s GitHub Releases, validated against the checksum and then cached. If you’re consuming Emily as a repo, the mlx repo is fetched as a git dependency and built in-tree.

Additionally, the mlx team have tagged a release that contained the thread-safety improvements, and Emily is building against that now.

For the number crunchers, the changes are modest:

  • Native axis reversal via mx::slice providing performance improvements to Nx.sort, Nx.argsort and Nx.reverse.
  • Additional quantization modes are now supported (default remains affine)
2 Likes

Emily 0.3.5 is released.

The only significant change this release includes is support for ‘thin’ SVD (for rank-2 inputs), something the native mlx libraries don’t currently support. Without this, mlx will always materialise full matrices - even if you don’t need them - resulting in possible OOMs (see below).

Until mlx addresses this natively, Emily sidesteps it and uses the Gram matrix formulation which does compute the thin version out of operations mlx supports natively.

2 Likes