A couple weeks ago i saw the MemVid Project, and though, “Hey, this looks fun”, so i decided to implement something similar in Elixir, so here’s ExMemVid
The core idea is to treat video frames as a data storage medium. Each frame in the video contains a QR code that holds a chunk of text. A separate search index is created using text embeddings to allow for fast, semantic searching of the content stored in the video.
ExMemvid is a proof-of-concept library for storing and retrieving large amounts of text data by encoding it into a video file composed of QR code frames. It leverages modern Elixir libraries for machine learning, video processing, and vector search to provide a unique solution for data storage and semantic retrieval.
How it Works
The core idea is to treat video frames as a data storage medium. Each frame in the video contains a QR code that holds a chunk of text. A separate search index is created using text embeddings to allow for fast, semantic searching of the content stored in the video.
Encoding Process
- Text Chunking: The input text is divided into smaller, manageable chunks.
- Embedding: A sentence transformer model from Hugging Face (via
Bumblebee
) generates a vector embedding for each text chunk.- QR Code Generation: Each text chunk is serialized (optionally with Gzip compression) and encoded into a QR code image.
- Video Encoding: The QR code images are compiled into a video file, where each image becomes a single frame. The library uses
Xav
andEvision
(OpenCV bindings) for this.- Index Creation: The vector embeddings are stored in an HNSWLib (Hierarchical Navigable Small World) index for efficient similarity search. This index maps the embeddings to their corresponding frame numbers in the video.
- Saving: The final video file and the search index are saved to disk.
Retrieval Process
- Search Query: The user provides a text query.
- Query Embedding: The query is converted into a vector embedding using the same model as the encoding process.
- Semantic Search: The HNSWLib index is queried to find the text chunks with embeddings most similar to the query’s embedding.
- Frame Identification: The search results from the index provide the frame numbers where the relevant text chunks are stored.
- Frame Decoding: The
Retriever
seeks to the specific frames in the video file, reads the QR codes, and decodes them to retrieve the original text chunks.- Result Aggregation: The retrieved text chunks are returned to the user.
Features
- Data Archiving: Store large text corpora in a compressed video format.
- Semantic Search: Go beyond keyword matching with state-of-the-art text embeddings.
- Configurable: Easily configure everything from the video codec and QR code version to the embedding model.
- Concurrent: Utilizes Elixir’s concurrency to parallelize embedding and frame decoding tasks.
- Extensible: The
Embedding
behaviour allows for swapping out the embedding implementation.- Supervised: Built-in supervisors for managing encoder and retriever processes.