"key :decoder_start_token_id not found" trying to use Whisper with Bumblebee

Heya! Hope this is the right place for this question, if not I’m happy to move it.

I was trying to play around with speech-to-text using the OpenAI Whisper model in Livebook by kinda following along with Chris McCord’s YouTube video and this post from Dockyard: Audio Speech Recognition in Elixir with Whisper Bumblebee - DockYard. However, I’m getting the error:

** (KeyError) key :decoder_start_token_id not found in: [max_new_tokens: 100, defn_options: [compiler: EXLA]]. If you are using the dot syntax, such as map.field, make sure the left-hand side of the dot is a map
    (bumblebee 0.2.0) lib/bumblebee/text/generation.ex:135: Bumblebee.Text.Generation.build_generate/3
    (bumblebee 0.2.0) lib/bumblebee/audio/speech_to_text.ex:23: Bumblebee.Audio.SpeechToText.speech_to_text/5
    #cell:e4zqu3usyi6eu7maylexdftfnarnp432:6: (file)

Which, doing a little poking around, I think is related to specifying the language. I’ve got audio in en/US but not sure how to supply :decoder_start_token_id in this case (I’ve found some Python examples, but haven’t been able to get it to work in Elixir). I can’t find any examples of providing :decoder_start_token_id with Bumblebee.Audio.speech_to_text/4. Any help is greatly appreciated. For reference, here’s the .livebook source (in case I’m doing something stupid):


Mix.install([
  {:bumblebee, github: "elixir-nx/bumblebee"},
  {:exla, "~> 0.4"},
  {:nx, github: "elixir-nx/nx", sparse: "nx", override: true}
])

Section

Nx.default_backend(EXLA.Backend)

{:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-tiny"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-tiny"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-tiny"})

serving =
  Bumblebee.Audio.speech_to_text(whisper, featurizer, tokenizer,
    max_new_tokens: 100,
    defn_options: [compiler: EXLA]
  )

I believe you need exla 0.5+. I’m not at a place to see the Livebooks I’ve used but that all looks like a normal setup process. At some point you’ll also use bumblebee and nx versions because those declarations are both using the main branch on GitHub, which can be unstable at times.

I’d honestly copy the mix install parts according to this recent example: bumblebee/speech_to_text.exs at main · elixir-nx/bumblebee · GitHub

Bumblebee 0.2, exla 0.5.1+ and Nx 0.5.1+. All of these are moving quickly so I’ll be upgrading them very often for my experiments.

Hey, that got me a bit closer!

Now I’m seeing libc++abi: terminating with uncaught exception of type std::out_of_range: Span::at failed bounds check which I think is coming from exla and possibly related to I had Exla working at one point, but now it keeps crashing, and I can't figure out why - #13 by hbko

When I get a bit more time I’ll try recompiling exla with the instructions from that post and see if I can get it to work. I understand that all of this is moving quickly, I just thought it’d be fun to poke around. :smile:

Thanks!

EDIT: for reference, I’m on an M1 Max Pro. I just looked at the live_beats project to see if Chris McCord checked that stuff in (from Embed and broadcast Whisper speech-to-text in your Phoenix app in 15 minutes - YouTube) but I don’t see any reference to bumblebee or nx in the mix.deps in trunk. It looks like he did that video from a Mac. :man_shrugging:

Hey @ohnoimdead, you are installing Bumblebee directly form GitHub and we recently changed the API. The speech_to_text function accepts an additional argument, see this example. I will add a guard to make this transition less confusing : )

Awesome, thanks @jonatanklosko! For those playing along and trying to use latest Bumblebee in Livebook with the Whisper model, here’s what currently works for me:


Untitled notebook

Mix.install([
  {:bumblebee, github: "elixir-nx/bumblebee"},
  {:exla, "~> 0.4"},
  {:nx, github: "elixir-nx/nx", sparse: "nx", override: true}
])

Section

Nx.default_backend(EXLA.Backend)

{:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-tiny"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-tiny"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-tiny"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "openai/whisper-tiny"})

serving =
  Bumblebee.Audio.speech_to_text(whisper, featurizer, tokenizer, generation_config,
    defn_options: [compiler: EXLA]
  )

Nx.Serving.run(serving, {:file, "/Users/tres/Desktop/foo.mp3"})
1 Like