ExMemVid - Using videos as a search db

, ,

A couple weeks ago i saw the MemVid Project, and though, “Hey, this looks fun”, so i decided to implement something similar in Elixir, so here’s ExMemVid

The core idea is to treat video frames as a data storage medium. Each frame in the video contains a QR code that holds a chunk of text. A separate search index is created using text embeddings to allow for fast, semantic searching of the content stored in the video.

ExMemvid is a proof-of-concept library for storing and retrieving large amounts of text data by encoding it into a video file composed of QR code frames. It leverages modern Elixir libraries for machine learning, video processing, and vector search to provide a unique solution for data storage and semantic retrieval.

How it Works

The core idea is to treat video frames as a data storage medium. Each frame in the video contains a QR code that holds a chunk of text. A separate search index is created using text embeddings to allow for fast, semantic searching of the content stored in the video.

Encoding Process

  1. Text Chunking: The input text is divided into smaller, manageable chunks.
  2. Embedding: A sentence transformer model from Hugging Face (via Bumblebee) generates a vector embedding for each text chunk.
  3. QR Code Generation: Each text chunk is serialized (optionally with Gzip compression) and encoded into a QR code image.
  4. Video Encoding: The QR code images are compiled into a video file, where each image becomes a single frame. The library uses Xav and Evision (OpenCV bindings) for this.
  5. Index Creation: The vector embeddings are stored in an HNSWLib (Hierarchical Navigable Small World) index for efficient similarity search. This index maps the embeddings to their corresponding frame numbers in the video.
  6. Saving: The final video file and the search index are saved to disk.

Retrieval Process

  1. Search Query: The user provides a text query.
  2. Query Embedding: The query is converted into a vector embedding using the same model as the encoding process.
  3. Semantic Search: The HNSWLib index is queried to find the text chunks with embeddings most similar to the query’s embedding.
  4. Frame Identification: The search results from the index provide the frame numbers where the relevant text chunks are stored.
  5. Frame Decoding: The Retriever seeks to the specific frames in the video file, reads the QR codes, and decodes them to retrieve the original text chunks.
  6. Result Aggregation: The retrieved text chunks are returned to the user.

Features

  • Data Archiving: Store large text corpora in a compressed video format.
  • Semantic Search: Go beyond keyword matching with state-of-the-art text embeddings.
  • Configurable: Easily configure everything from the video codec and QR code version to the embedding model.
  • Concurrent: Utilizes Elixir’s concurrency to parallelize embedding and frame decoding tasks.
  • Extensible: The Embedding behaviour allows for swapping out the embedding implementation.
  • Supervised: Built-in supervisors for managing encoder and retriever processes.

Hi there, is it only for fun? Asking because I can’t understand how it makes sense, honestly :stuck_out_tongue:

Glad to see that Xav was useful!

2 Likes

It is more of a toy-library, yes. Its memory usage scalability is pretty interesting though, way more efficient when comparing with some common json-embeddings offline storage. I’ll build some benchmarks later.

I’m no expert on LLMs, but from a compression perspective, I don’t see how it can be efficient. Somebody explained it well in an issue, they provided benchmarks too.

2 Likes

I’m pretty sure it’s just a joke.

The amount of code and the extensive documentation makes it feel like it’s way too much work to put into a joke. But, it’s quite easy to generate this much stuff with an LLM, with… this prompt.

1 Like

Yep, its a pretty “eccentric” idea, i’m curious about the “v2” haha.

Add some steganography and you will get a hit ! :smiley: