I’ve recently been interested in trying out Nx with Bumblebee so have been following along with the fantastic dockyard article on creating a basic RAG.
However, i’m seemingly falling at the first hurdle with just getting the predict which uses Nx.Serving.
I have the following:
defmodule Rag.Embedding do
def serving() do
{:ok, model} = Bumblebee.load_model({:hf, "intfloat/e5-large-v2"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "intfloat/e5-large-v2"})
Bumblebee.Text.text_embedding(model, tokenizer,
embedding_processor: :l2_norm,
defn_options: [compiler: EMLX]
)
end
def predict(text) do
Nx.Serving.batched_run(__MODULE__, text)
end
end
and starting the serving with:
defmodule Rag.Application do
use Application
@impl true
def start(_type, _args) do
Nx.Defn.default_options(compiler: EMLX)
Nx.default_backend(EMLX.Backend)
children = [
{Nx.Serving, serving: Rag.Embedding.serving(), name: Rag.Embedding}
]
opts = [strategy: :one_for_one, name: Rag.Supervisor]
Supervisor.start_link(children, opts)
end
end
The model successfully downloads however it seems to hang on the predict.
rag on main [!?] is 📦 v0.1.0 via 💧 v1.18.3 (OTP 27) via ❄️ impure (nix-shell-env) took 47s
λ iex -S mix
Erlang/OTP 27 [erts-15.2.6] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]
Interactive Elixir (1.18.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Process.whereis(Rag.Embedding)
#PID<0.220.0>
iex(2)> Rag.Embedding.predict("Hello, world") # stalls here.
I’m coming from a python machine learning background so it may be something completely obvious.
Any help here would be appreciated!
Does the behavior change if you let the shell run for a while before requesting a prediction? Checking for the process will tell you if it’s started, but loading models etc happens after that.
Also observe the machine’s CPU/GPU load while waiting: is there work being done?
If you open a shell with iex -S mix run --no-start, you can load the Bumblebee models manually and the downloads will be cached for future runs.
EMLX is also not currently stable for usage as a compiler, so I recommend you use just the backend, or maybe even EXLA with the backend and compiler set
This Nx issue aims to help in debugging why EMLX is not on par with EXLA w.r.t. results. I have someone working on this now. Hopefully by next week we’ll have something out.
And there are a few issues on the EMLX tracker related to user reports.