Heya! Hope this is the right place for this question, if not I’m happy to move it.
I was trying to play around with speech-to-text using the OpenAI Whisper model in Livebook by kinda following along with Chris McCord’s YouTube video and this post from Dockyard: Audio Speech Recognition in Elixir with Whisper Bumblebee - DockYard. However, I’m getting the error:
** (KeyError) key :decoder_start_token_id not found in: [max_new_tokens: 100, defn_options: [compiler: EXLA]]. If you are using the dot syntax, such as map.field, make sure the left-hand side of the dot is a map
(bumblebee 0.2.0) lib/bumblebee/text/generation.ex:135: Bumblebee.Text.Generation.build_generate/3
(bumblebee 0.2.0) lib/bumblebee/audio/speech_to_text.ex:23: Bumblebee.Audio.SpeechToText.speech_to_text/5
#cell:e4zqu3usyi6eu7maylexdftfnarnp432:6: (file)
Which, doing a little poking around, I think is related to specifying the language. I’ve got audio in en/US but not sure how to supply :decoder_start_token_id
in this case (I’ve found some Python examples, but haven’t been able to get it to work in Elixir). I can’t find any examples of providing :decoder_start_token_id
with Bumblebee.Audio.speech_to_text/4
. Any help is greatly appreciated. For reference, here’s the .livebook source (in case I’m doing something stupid):
Mix.install([
{:bumblebee, github: "elixir-nx/bumblebee"},
{:exla, "~> 0.4"},
{:nx, github: "elixir-nx/nx", sparse: "nx", override: true}
])
Section
Nx.default_backend(EXLA.Backend)
{:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-tiny"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-tiny"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-tiny"})
serving =
Bumblebee.Audio.speech_to_text(whisper, featurizer, tokenizer,
max_new_tokens: 100,
defn_options: [compiler: EXLA]
)