Emily is an Elixir library that runs Nx computations on Apple’s MLX. Install it as the default Nx backend and Nx, defn, Axon, Nx.Serving, and Bumblebee all dispatch to the Metal GPU on Apple Silicon with no further integration work.
Highlights:
Nx.Backend with transparent fallback for uncovered ops
Nx.Defn.Compiler that threads Bumblebee and Nx.Serving through MLX
Fused transformer kernels (rms_norm, layer_norm, rope, scaled-dot-product attention), optionally applied to any Bumblebee model via a shim
Affine group-wise int2 / int4 / int8 quantisation
bf16 mixed-precision training
Per-process Metal command queues for concurrent inference on a shared model
Runnable Livebooks for DistilBERT QA, Qwen3 quantized generation, MNIST training, Whisper transcription,
and the fused kernels
MLX is vendored in (for now) because the recent command queue threading-related features have not yet been released. The first compile will compile MLX, which can be a bit slow (> 1 min).
macOS / Apple Silicon. Requires Xcode with the Metal toolchain;
emily exposes the concurrency model now available in mlx, I assume emlx can’t do this because of the version it uses.
emily has features I don’t believe emlx advertises. Fused kernels, quantisation, mixed precision training (might be wrong about this one), maybe also zero-copy to_binary/1.
emily vendors mlx at a specific commit (for reasons outlined above and in more details in the docs), emlx uses a relatively old binary
There’s quite a few conformance tests in emily that use real open-source models, but I mostly did that for my own sanity
Also Emily allows you to control whether MLX’s kernels are compiled AOT or JIT. The latter significantly reduces the size of the final application, at some small trade-off for start-up time. The CI tests, including conformance, run in both modes.
I will say it would have been nice to acknowledge EMLX in the README as a source of inspiration, since the PLAN.md committed to main references it quite a bit.
The majority of the changes are in the packaging. If you’re consuming Emily as a hex package, the NIF is downloaded from Emily’s GitHub Releases, validated against the checksum and then cached. If you’re consuming Emily as a repo, the mlx repo is fetched as a git dependency and built in-tree.
Additionally, the mlx team have tagged a release that contained the thread-safety improvements, and Emily is building against that now.
For the number crunchers, the changes are modest:
Native axis reversal via mx::slice providing performance improvements to Nx.sort, Nx.argsort and Nx.reverse.
Additional quantization modes are now supported (default remains affine)
The only significant change this release includes is support for ‘thin’ SVD (for rank-2 inputs), something the native mlx libraries don’t currently support. Without this, mlx will always materialise full matrices - even if you don’t need them - resulting in possible OOMs (see below).
Until mlx addresses this natively, Emily sidesteps it and uses the Gram matrix formulation which does compute the thin version out of operations mlx supports natively.