Torus is a plug-and-play Elixir library that seamlessly integrates PostgreSQL’s search into Ecto, streamlining the construction of advanced search queries.
The goal is to help developers create Ecto queries that return relevant results with optimal performance, and it’s best to show how on an example:
Pattern matching: Searches for a specific pattern in a string.
Similarity: Searches for items that are closely alike based on attributes, often using measures like cosine similarity or Euclidean distance. Is great for fuzzy searching and ignoring typos in short texts.
Text Search Vectors: Uses term-document matrix vectors for full-text search, enabling efficient querying and ranking based on term frequency. - PostgreSQL: Full Text Search. Is great for large datasets to quickly return relevant results.
iex> insert_post!(title: "Hogwarts Shocker", body: "A spell disrupts the Quidditch Cup.")
...> insert_post!(title: "Diagon Bombshell", body: "Secrets uncovered in the heart of Hogwarts.")
...> insert_post!(title: "Completely unrelated", body: "No magic here!")
...> Post
...> |> Torus.full_text([p], [p.title, p.body], "uncov hogwar")
...> |> select([p], p.title)
...> |> Repo.all()
["Diagon Bombshell"]
Semantic Search: Understands the contextual meaning of queries to match and retrieve related content utilizing natural language processing. Read more about semantic search in Semantic search with Torus guide.
insert_post!(title: "Hogwarts Shocker", body: "A spell disrupts the Quidditch Cup.")
insert_post!(title: "Diagon Bombshell", body: "Secrets uncovered in the heart of Hogwarts.")
insert_post!(title: "Completely unrelated", body: "No magic here!")
embedding_vector = Torus.to_vector("A magic school in the UK")
Post
|> Torus.semantic([p], p.embedding, embedding_vector)
|> select([p], p.title)
|> Repo.all()
["Diagon Bombshell"]
The above macros accept a list of options to customize their behavior. See function docs for examples. Most functions have an optimization section that might help you boost the performance of these search queries.
In upcoming plans, we’ll add support for highlighting search results and extend the search with hybrid search. Please let me know what do you think of it, and I’ll gladly hear any suggestions on how to make it better!
Looks interesting! I saw that all the examples were using “function-style queries.” Does Torus support “macro-style” queries? If not, are you planning to add support for them?
Yes, you can also use keyword based queries! It’s not as native as other keywords, since afaik you can’t expand custom macros in from clause, but you can do something like this:
from(p in Post, select: p.title)
|> Torus.like([p], p.title, "pinned%")
|> Repo.all()
Great question! Torus.like/5 and Torus.ilike/5 are using Ecto’s like/2 and ilike/2 under the hood, but they allow us to search across multiple columns (just like the rest of the search functions). The main reason I’ve added them was to cover more of the search types so we won’t need to switch contexts and reach for a different tool.
So summing up, for two reasons:
Torus versions allow us to search across multiple columns
Torus docs and API are more complete with these two similarity options
One (somewhat) lesser known detail about these LIKE operators is that you have to be careful to escape the match characters from the user input because it is possible to craft pathological matches/regexes and use them as a dos vector.
It would be nice if there were a way to do that automatically, perhaps at compile-time with a sigil/macro.
Just to make sure we’re on the same page, do you mean LIKE injections? If yes, then I was considering adding an option to sanitize the input term, but I thought that adding a warning to the docs and delegating this to the caller might be the best/most customizable option. Is this what you meant?
Yes, this is what I meant. It’s funny - I was looking for this exact blog post to use as a reference and I couldn’t find it!
I’m not sure the best path, but you could potentially do something like a sigil where the interpolated strings are escaped. Either way a warning in the docs is a good idea.
Semantic search is finally here! Read more in Semantic search with Torus guide.
Shortly - it allows you to generate embeddings using a configurable (and chainable) adapters and use them to compare against the ones stored in your database.
Supported adapters (for now):
Torus.Embeddings.OpenAI - uses OpenAI’s API to generate embeddings.
Torus.Embeddings.HuggingFace - uses HuggingFace’s API to generate embeddings.
Torus.Embeddings.LocalNxServing - generate embeddings on your local machine using a variety of models available on Hugging Face
Torus.Embeddings.PostgresML - uses PostgreSQL PostgresML extension to generate embeddings
Torus.Embeddings.Batcher - a long‑running GenServer that collects individual embedding calls, groups them into a single batch, and forwards the batch to the configured embedding_module (any from the above or your custom one).
Torus.Embeddings.NebulexCache - a wrapper around Nebulex cache, allowing you to cache the embedding calls in memory, so you save the resources/cost of calling the embedding module multiple times for the same input.
And you can easily create your own adapter by implementing the Torus.Embedding behaviour.
So after you’ll pick your favorite embedding adapter, you can add semantic search:
def search(term) do
search_vector = Torus.to_vector(term)
Post
|> Torus.semantic([p], p.embedding, search_vector, distance: :l2_distance, pre_filter: 0.7)
|> Repo.all()
end
Future plans:
Release torus_example app that would allow experimenting with options and different search types to pick the best one
Add support for highlighting search results. (Base off of a ts_headline function)
Extend similarity search to support fuzzystrmatch extension distance options.