Hi, I am trying to use Bumblebee to transcript audio to text
I have few multiple *.webm files with audio that sounds like ‘text audio with Bumblebee’, ‘one, two, three, four, five’, etc
I can reproduce it with mediaplayer and all file contains different audio
However, looks like bumblebee always generates same prediction with text ’ you’:
predicitons: %{
chunks: [
%{text: " you", start_timestamp_seconds: nil, end_timestamp_seconds: nil}
]
}
# ..x5+ times
predicitons: %{
chunks: [
%{ text: " Thank you.", start_timestamp_seconds: nil, end_timestamp_seconds: nil }
]
}
predicitons: %{
chunks: [
%{text: " Bye.", start_timestamp_seconds: nil, end_timestamp_seconds: nil}
]
}
(predicitons
is IO.inspect
label in here)
My code:
Application application.ex
, child spec:
children = [
........
{Nx.Serving,
serving: create_audio_serving(),
name: Recognizer.AudioServing,
batch_size: 4,
batch_timeout: 100}
.....
]
defp create_audio_serving() do
# Load the pre-trained model
{:ok, model_info} = Bumblebee.load_model({:hf, "openai/whisper-tiny"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-tiny"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-tiny"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai/whisper-tiny"})
Bumblebee.Audio.speech_to_text_whisper(model_info, featurizer, tokenizer, generation_config,
compile: [batch_size: 4],
defn_options: [
compiler: EXLA
]
)
end
In worker process I have code:
defmodule Recognizer.Room do
@moduledoc false
use GenServer, restart: :temporary
...
@impl true
def handle_cast({:receive_audio_msg, audio_base64}, state) do
file = "/tmp/audio-#{state.id}.webm"
audio_data = Base.decode64!(audio_base64)
File.write!(file, audio_data)
reader = Xav.Reader.new!(file, read: :audio)
case Xav.Reader.next_frame(reader) do
{:ok, frame} ->
tensor = Xav.Frame.to_nx(frame)
Task.async(fn -> Nx.Serving.batched_run(Recognizer.AudioServing, tensor) end)
{:error, :no_keyframe} ->
Logger.warning("Couldn't decode audio frame - missing keyframe!")
end
{:noreply, state}
end
@impl true
def handle_info({_ref, predicitons}, state) do
predicitons |> IO.inspect(label: :predicitons)
{:noreply, state}
end
I have added default exla backend like it is suggested in very similar question but predicition still nonsense
What I am doing wrong and why I have this random predictions?