I am new to NX/Bumblebee, and am trying to leverage out of the box Mistral connectivity. I’m trying to use a “Small” model, and my computer is running out of RAM (31GiB) and SWAP (8 GiB) and crashing. Key specs:
OS: Fedora Linux 42 (KDE Plasma Desktop Edition) x86_64
CPU: AMD Ryzen 7 5800X (16) @ 4.85 GHz
GPU: AMD Radeon RX 5700 XT [Discrete]
Memory: 6.35 GiB / 31.28 GiB (20%)
Swap: 2.09 GiB / 8.00 GiB (26%)
Is this normal and I just need a better computer, or do I have something killing me in my setup? Most of this below is blindly copied from examples I’ve found online. Here’s how I have the setup configured:
def setup_llm() do
# token = File.read!("token.txt")
repo = {:hf, "mistralai/Mistral-Small-3.2-24B-Instruct-2506"}
{:ok, model_info} =
Bumblebee.load_model(repo,
backend: EXLA.Backend,
module: Bumblebee.Text.Mistral,
architecture: :base
)
{:ok, tokenizer} = Bumblebee.load_tokenizer(repo)
{:ok, generation_config} = Bumblebee.load_generation_config(repo)
generation_config =
Bumblebee.configure(generation_config,
max_new_tokens: 256,
strategy: %{type: :multinomial_sampling, top_p: 0.6}
)
Bumblebee.Text.generation(model_info, tokenizer, generation_config,
compile: [batch_size: 10, sequence_length: 512],
# stream: true,
defn_options: [compiler: EXLA]
)
end
If the answer is I need a better computer - got it - what should I be looking for for what my computer can handle?
I think you can usually calculate RAM with something like 4x params, so in this case it’s 4x 24B = 96 Gb. It’s not a precise formula but that won’t work on your machine. As you see, small is relative.
You can try smollm2 instead: HuggingFaceTB/SmolLM2-1.7B-Instruct · Hugging Face
Or if you want to go with Mistral, one of their older smaller models should work: mistralai/Mistral-7B-Instruct-v0.3 · Hugging Face
Or other models below or around 8B params. The quality of the output of older and smaller models will usually be worse compared to newer and larger models.
2 Likes
Thanks - this is a very helpful metric.
Is there a guide for how to write adapters to models that Bumblebee does not natively support? Or should I be looking into Axon directly for that?
2 Likes
Basically, you need to implement the model in Bumblebee if it’s not supported yet. We wrote about that on the bitcrowd blog a while ago.
Often it’s just some small changes to already existing implementations, so as soon as you understand how your model is different, it is actually not a lot of code you have to write.
It takes a while to get into because everything is based on Nx which also means your usual Elixir debugging techniques won’t work (as your building a computational graph with the Elixir code).
There are other ways to debug, there are also some blog posts about Nx, Axon, Bumblebee on the dockyard blog: e.g. Nx for Absolute Beginners - DockYard.
And finally, you can also try to throw an LLM at the problem. It might not get you 100% there but give you an idea what’s missing.
Here is a recent PR that I think was first written mainly by an LLM: https://github.com/elixir-nx/bumblebee/pull/423
Here another (also first pass by LLM, then I rewrote most of it): https://github.com/elixir-nx/bumblebee/pull/422
For the new Mistral models specifically, I’m not sure but I think there are two main obstacles:
- I think they use different tokenizers (tekken?) so that could bring some troubles if it’s not supported yet in
Bumblebee
- I think these are Mixture of Experts (MoE) models, and I don’t think there is already an implementation of an MoE model in
Bumblebee, then I guess it would be a welcome contribution.
3 Likes
This is an incredibly useful response - and that blog post is excellent.
Thank you - I will look into attempting to build the adapter, and if it works, I’ll pay it forward upstream.
3 Likes