Trouble with batched whisper - doesn’t work with Supervisor and batched_run

dailydaniel · January 8, 2025, 3:12am

I’ve tried to use batched_run with whisper and
it works infinitely:

children = [
  {Nx.Serving,
   serving: serving,
   name: ServingWhisper,
   batch_size: 5,
   batch_timeout: 3000}
]

{:ok, _pid} = Supervisor.start_link(children, strategy: :one_for_one)

# output = Nx.Serving.batched_run(MyServingWhisper, [tensor])
output = Nx.Serving.batched_run(MyServingWhisper, {:file, "downloaded.wav"})

It works well with simple run, but doesn’t work with Supervisor and batched_run, I’ve tried files/tensors/batches, different batch_size, with and without chunk_num_seconds.

As i’ve read in docs " This serving always accepts a single input. A list of tensors is interpreted as continuous chunks. To transcribe multiple inputs concurrently use Nx.Serving.batched_run/2." it should work with a list of tensors, but may be I’m doing smth wrong.

jonatanklosko · January 8, 2025, 4:54am

Hey @dailydaniel! So it finishes with Nx.Serving.run but runs forever with Nx.Serving.batched_run? Can you provide a full snippet to reproduce?

As i’ve read in docs " This serving always accepts a single input. A list of tensors is interpreted as continuous chunks. To transcribe multiple inputs concurrently use Nx.Serving.batched_run/2."

This means that if you do batched_run with [tensor1, tensor2] they are effectively concatenated into a single input. This is because we allow the input to be any enumerable, including Stream. Similarly, you cannot call it with [{:file, f1}, {:file, f2}], instead you can use Task.async_stream or similar, and call batched_run with each file individually.

dailydaniel · January 8, 2025, 11:42am

@jonatanklosko of course, this is full code:

I start docker with livebook on my vm and connect it from my local machine.

Mix.install([
{:bumblebee, github: "elixir-nx/bumblebee"},
{:exla, "~> 0.4"},
{:nx, "~> 0.9.2"}
])

this code works well (just like in docs)

{:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-tiny"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-tiny"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-tiny"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai/whisper-tiny"})

serving =
  Bumblebee.Audio.speech_to_text_whisper(whisper, featurizer, tokenizer, generation_config)

output = Nx.Serving.run(serving, {:file, "/path/to/audio.wav"})

this code runs forever

children = [
  {Nx.Serving,
   serving: serving
   name: WhisperServing,
   batch_size: 5,
   batch_timeout: 3000}
]

{:ok, _pid} = Supervisor.start_link(children, strategy: :one_for_one)

# this works infinitely, reading file to tensor also works infinitely
Nx.Serving.batched_run(WhisperServing, {:file, "/path/to/audio.wav"})

# async also works infinitely
file1 = {:file, "/path/to/audio1.wav"}
file2 = {:file, "/path/to/audio2.wav"}
file3 = {:file, "/path/to/audio3.wav"}

tasks = [
  Task.async(fn -> Nx.Serving.batched_run(WhisperServing, file1) end),
  Task.async(fn -> Nx.Serving.batched_run(WhisperServing, file2) end),
  Task.async(fn -> Nx.Serving.batched_run(WhisperServing, file3) end)
]

results = Enum.map(tasks, &Task.await(&1, 5000))

jonatanklosko · January 8, 2025, 1:31pm

All of these work for me. Are you configuring EXLA? To set the default backend do Nx.global_default_backend(EXLA.Backend) after Mix.install. It’s also a good idea to pass defn_options: [compiler: EXLA] when building the serving, so the whole model is compiled.

jonatanklosko · January 8, 2025, 1:34pm

As a sidenote, I would do this:

serving =
  Bumblebee.Audio.speech_to_text_whisper(whisper, featurizer, tokenizer, generation_config,
    chunk_num_seconds: 30,
    compile: [batch_size: 5],
    defn_options: [compiler: EXLA]
  )

And then remove batch_size: 5 from the supervised definition. Doing compile: [batch_size: 5] will automatically set the batch_size, and it will also compile the model on serving startup, so that it is already cached for the first batched_run call.

(This should not have an impact on the above, but it’s just the preferable configuration in general : ))

dailydaniel · January 8, 2025, 1:37pm

Thanks! May be this is my problem, course after setting default backend livebook cell outputs binary backend instead of exla

dailydaniel · January 8, 2025, 2:10pm

Fixed with providing client:
Nx.global_default_backend({EXLA.Backend, client: :host}), thanks)

jonatanklosko · January 8, 2025, 3:37pm

:host should be the default client, unless you also have a GPU.

My guess is that you were setting Nx.global_backend instead of Nx.global_default_backend, which would fit the results above. Nx.global_backend sets the backend only in the current process and Nx.Serving.run runs in the current process, while Nx.global_default_backend sets for all processes, including the serving one : )

dailydaniel · January 8, 2025, 6:05pm

You right, thanks😅