Cool-whisper-server - OpenAI compatible Whisper server

I created my first pet project on Elixir after many years of writing in Python.

It’s an inference server for Whisper based on Nx, EXLA and Bumblebee with support for dynamic batching (comes out of the box in Nx).

It is made primarily for use through the OpenAI library in Python.

Whisper Inference Server is an OpenAI compatible Elixir-based HTTP server for running inference on audio files using OpenAI’s Whisper model. The server supports batching for efficient inference, CPU/GPU execution via EXLA backend, and can be configured dynamically at runtime through command-line parameters.

Features

  • Batching: Process multiple audio files simultaneously to optimize inference.
  • CPU/GPU support: Choose between host (CPU) or cuda (GPU) backends for inference.
  • Dynamic configuration: Configure model, batch size, timeout, and other parameters at runtime.
  • Modular design: Clean architecture for easy extension and maintenance.
  • OpenAI compatible.
3 Likes