Definitely, send me a pvt message I guess?
1 Like
Hello Christian, Iām curious about your findings as well. My Gist so far:
- Whisper is great at speech recognition, has timing issues though and Iām not yet sure how to deal with silences in speech (I want to hide subs for example when nobody is talking⦠use VAD Voice Activity Detection on top? Dunno
) I ran my own and also on replicate.com, which is pretty fast (1h file in 90 secs)
- Sonix feels like the goat to me. Similarly good in regards of speech as whisper, but way better timings, speaker diarization, etc, etc. BUT: expensive!
- rev.ai - is what Iām currently using for my Video CMS. Itās actually pretty good and timings are very accurate. Itās much cheaper than Sonix and I feel gives me the best bang for the buck right now (Iām currently torn between sonix and rev
)
- aws transcribe - feels āokā. timings are good, speech in my examples maybe a little less accurate than rev.ai
- fireflies - Cool tool, didnāt play with it that much yet. I think they might use whisper under the hood, as I see a bunch of weird words that whisper was able to catch successfully (none of the others did), and they caught them too. They also have timing issues.
- deepgram - feels very similar what aws and rev give me on my first few tries. pretty cheap though and lots of features, so I will play some more with it. I give them props for good DX as well.
Anyway, thatās quickly from me. Would love a DM from you to see what your experiences are. Thank you!
5 Likes