Hi, I’m trying to load Whisper models into Elixir using Nx, EXLA, and Bumblebee.
I’ve encountered the following issue:
When using the whisper-large-v3
model to transcribe audio files under 200 KB, the process consumes over 15 GB of RAM.
This is the script I created:
defmodule WhisperLarge do
alias Bumblebee
require Logger
def run do
Nx.global_default_backend(EXLA.Backend)
{:ok, model} = Bumblebee.load_model({:hf, "openai/whisper-large-v3"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-large-v3"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-large-v3"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai/whisper-large-v3"})
generation_config = Bumblebee.configure(generation_config, max_new_tokens: 448)
serving =
Bumblebee.Audio.speech_to_text_whisper(
model,
featurizer,
tokenizer,
generation_config,
compile: [batch_size: 1],
chunk_num_seconds: 3,
timestamps: :segments,
language: "es",
stream: false
)
result = Nx.Serving.run(serving, {:file, "file.wav"})
text =
result.chunks
|> Enum.map(& &1.text)
|> Enum.join(" ")
IO.puts("Transcripción: #{text}")
end
end
Dependencies:
{:bumblebee, "~> 0.6.0"},
{:nx, "~> 0.9.0"},
{:exla, "~> 0.9.0"}
I’ve also tested with other models like whisper-tiny
, which don’t consume nearly as much memory — but they are not as accurate for my use case.
Someone on the Elixir Slack suggested using whisper.cpp
or Python, which does use significantly less memory.
However, I was hoping to accomplish the full transcription process entirely in Elixir.
I’d really appreciate any advice or suggestions on reducing memory usage with large models in Nx/Bumblebee.