Xav - media library built on top of FFmpeg with an easy integration with Nx

Xav is a media library built on top of FFmpeg that can be used for reading audio and video streams.

At the moment, it supports reading from a file and video camera.

It’s still experimental but a couple of features seem to be already usable.

Documentation is available here.

Examples

Read from a file

r = Xav.new_reader!("./some_mp4_file.mp4")
{:ok, %Xav.Frame{} = frame} = Xav.next_frame(r)
tensor = Xav.Frame.to_nx(frame)
Kino.Image.new(tensor)

Read from a camera (tested on linux, requires v4l2 driver to be installed)

r = Xav.new_reader!("/dev/video0", device?: true)
{:ok, %Xav.Frame{} = frame} = Xav.next_frame(r)
tensor = Xav.Frame.to_nx(frame)
Kino.Image.new(tensor)

Speech to text

r = Xav.new_reader!("../sample.mp3", read: :audio)

{:ok, whisper} = Bumblebee.load_model({:hf, "openai/whisper-tiny"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "openai/whisper-tiny"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "openai/whisper-tiny"})

serving =
  Bumblebee.Audio.speech_to_text(whisper, featurizer, tokenizer,
    max_new_tokens: 100,
    defn_options: [compiler: EXLA]
  )

# read a couple of frames
frames =
  for _i <- 0..200 do
    {:ok, frame} = Xav.next_frame(r)
    Xav.to_nx(frame)
  end

batch = Nx.Batch.concatenate(frames)
batch = Nx.Defn.jit_apply(&Function.identity/1, [batch])
Nx.Serving.run(serving, batch) 
13 Likes