Initial prompt with whisper

Hi!

We are using Bumblebee.Audio with whisper. It works pretty well and we are enjoying it.

We would like to enhance some domain specific words transcription though, we saw that Whisper can have an extra textual prompt for this (see Whisper prompting guide ).

We did not find a way to leverage this prompting facilities with Bumblebee.Audio. Is this correct?

If whisper prompting is not yet available within bumblebee, what would it take to implement it? With some guiding, maybe we could help?

Thanks!

Rodrigue

I didn’t dig very far, but it kinda looks like those tokens just get thrown into the decoder at the start like an LLM. I don’t see any explicit way to do this in the Bumblebee.Audio API as it seems to offer only a high-level stream interface, but it probably wouldn’t be too hard to add.

Have you seen reliable results from prompting Whisper with the official python sdk? We have tried the method described in the linked blog post and found that the results are extremely sensitive to the contents of the prompt, to the point where changing a single , to a . yielded very different results.

As such we weren’t able to find a reliable way to actually effect the output. Curious to hear your experience!

1 Like

This is just what LLMs are like when they’re not instruction-tuned (time flies). Now you see why ChatGPT was so popular :slight_smile:

I know multimodal instruction-tuned LLMs exist. I don’t know anything about using them for speech-to-text, but I imagine it has been done.