Torus - Integrate PostgreSQL's search into Ecto queries

dimamik · March 18, 2025, 1:49pm

Torus is a plug-and-play Elixir library that seamlessly integrates PostgreSQL’s search into Ecto, streamlining the construction of advanced search queries.

The goal is to help developers create Ecto queries that return relevant results with optimal performance, and it’s best to show how on an example:

Pattern matching: Searches for a specific pattern in a string.

iex> insert_posts!(["Wand", "Magic wand", "Owl"])
...> Post
...> |> Torus.ilike([p], [p.title], "wan%")
...> |> select([p], p.title)
...> |> Repo.all()
["Wand"]

See like/5, ilike/5, and similar_to/5 for more details.

Similarity: Searches for items that are closely alike based on attributes, often using measures like cosine similarity or Euclidean distance. Is great for fuzzy searching and ignoring typos in short texts.

iex> insert_posts!(["Hogwarts Secrets", "Quidditch Fever", "Hogwart’s Secret"])
...> Post
...> |> Torus.similarity([p], [p.title], "hoggwarrds")
...> |> limit(2)
...> |> select([p], p.title)
...> |> Repo.all()
["Hogwarts Secrets", "Hogwart’s Secret"]

See similarity/5 for more details.

Text Search Vectors: Uses term-document matrix vectors for full-text search, enabling efficient querying and ranking based on term frequency. - PostgreSQL: Full Text Search. Is great for large datasets to quickly return relevant results.

   iex> insert_post!(title: "Hogwarts Shocker", body: "A spell disrupts the Quidditch Cup.")
   ...> insert_post!(title: "Diagon Bombshell", body: "Secrets uncovered in the heart of Hogwarts.")
   ...> insert_post!(title: "Completely unrelated", body: "No magic here!")
   ...>  Post
   ...> |> Torus.full_text([p], [p.title, p.body], "uncov hogwar")
   ...> |> select([p], p.title)
   ...> |> Repo.all()
   ["Diagon Bombshell"]

See full_text/5 for more details.

Semantic Search: Understands the contextual meaning of queries to match and retrieve related content utilizing natural language processing. Read more about semantic search in Semantic search with Torus guide.

insert_post!(title: "Hogwarts Shocker", body: "A spell disrupts the Quidditch Cup.")
insert_post!(title: "Diagon Bombshell", body: "Secrets uncovered in the heart of Hogwarts.")
insert_post!(title: "Completely unrelated", body: "No magic here!")

embedding_vector = Torus.to_vector("A magic school in the UK")

Post
|> Torus.semantic([p], p.embedding, embedding_vector)
|> select([p], p.title)
|> Repo.all()
["Diagon Bombshell"]

See semantic/5 for more details.

The above macros accept a list of options to customize their behavior. See function docs for examples. Most functions have an optimization section that might help you boost the performance of these search queries.

In upcoming plans, we’ll add support for highlighting search results and extend the search with hybrid search. Please let me know what do you think of it, and I’ll gladly hear any suggestions on how to make it better!

Links

Torus — Torus v0.4.0
GitHub:

gushonorato · March 18, 2025, 10:03pm

Looks interesting! I saw that all the examples were using “function-style queries.” Does Torus support “macro-style” queries? If not, are you planning to add support for them?

dimamik · March 19, 2025, 9:37am

Yes, you can also use keyword based queries! It’s not as native as other keywords, since afaik you can’t expand custom macros in from clause, but you can do something like this:

from(p in Post, select: p.title)
|> Torus.like([p], p.title, "pinned%")
|> Repo.all()

D4no0 · March 19, 2025, 4:28pm

What is the difference between the ecto like/ilike and the one provided by your library?

dimamik · March 20, 2025, 9:21am

Great question! Torus.like/5 and Torus.ilike/5 are using Ecto’s like/2 and ilike/2 under the hood, but they allow us to search across multiple columns (just like the rest of the search functions). The main reason I’ve added them was to cover more of the search types so we won’t need to switch contexts and reach for a different tool.

So summing up, for two reasons:

Torus versions allow us to search across multiple columns
Torus docs and API are more complete with these two similarity options

garrison · March 20, 2025, 5:58pm

One (somewhat) lesser known detail about these LIKE operators is that you have to be careful to escape the match characters from the user input because it is possible to craft pathological matches/regexes and use them as a dos vector.

It would be nice if there were a way to do that automatically, perhaps at compile-time with a sigil/macro.

dimamik · March 21, 2025, 1:15pm

Just to make sure we’re on the same page, do you mean LIKE injections? If yes, then I was considering adding an option to sanitize the input term, but I thought that adding a warning to the docs and delegating this to the caller might be the best/most customizable option. Is this what you meant?

ademenev · March 21, 2025, 3:20pm

Does/can it support pgroonga ?

garrison · March 21, 2025, 6:13pm

Yes, this is what I meant. It’s funny - I was looking for this exact blog post to use as a reference and I couldn’t find it!

I’m not sure the best path, but you could potentially do something like a sigil where the interpolated strings are escaped. Either way a warning in the docs is a good idea.

dimamik · March 21, 2025, 7:00pm

It’s not currently, but I think it might? Can you elaborate more on how you’d see this?

ademenev · March 21, 2025, 7:32pm

I did not give much thought to this yet. Probably it could be an extension that utilises pgroonga’s search operators

dimamik · April 19, 2025, 11:09am

Hey!

Semantic search is finally here! Read more in Semantic search with Torus guide.
Shortly - it allows you to generate embeddings using a configurable (and chainable) adapters and use them to compare against the ones stored in your database.

Supported adapters (for now):

Torus.Embeddings.OpenAI - uses OpenAI’s API to generate embeddings.
Torus.Embeddings.HuggingFace - uses HuggingFace’s API to generate embeddings.
Torus.Embeddings.LocalNxServing - generate embeddings on your local machine using a variety of models available on Hugging Face
Torus.Embeddings.PostgresML - uses PostgreSQL PostgresML extension to generate embeddings
Torus.Embeddings.Batcher - a long‑running GenServer that collects individual embedding calls, groups them into a single batch, and forwards the batch to the configured embedding_module (any from the above or your custom one).
Torus.Embeddings.NebulexCache - a wrapper around Nebulex cache, allowing you to cache the embedding calls in memory, so you save the resources/cost of calling the embedding module multiple times for the same input.

And you can easily create your own adapter by implementing the Torus.Embedding behaviour.

So after you’ll pick your favorite embedding adapter, you can add semantic search:

def search(term) do
  search_vector = Torus.to_vector(term)

  Post
  |> Torus.semantic([p], p.embedding, search_vector, distance: :l2_distance, pre_filter: 0.7)
  |> Repo.all()
end

Future plans:

Release torus_example app that would allow experimenting with options and different search types to pick the best one
Add support for highlighting search results. (Base off of a ts_headline function)
Extend similarity search to support fuzzystrmatch extension distance options.

Links

jam · June 13, 2025, 8:33pm

Hey @dimamik, thanks for making this! Looks very useful. I’m putting it in play on a project. Is it possible to stack ilike and similarity or will it be? Or is there a better approach than this?

query =
  if String.length(query) < 2 do
    base_query |> Torus.ilike([u, _m], [u.username], Utils.sanitize_ilike(input) <> "%")
  else
    base_query |> Torus.similarity([u, _m], [u.username], input, pre_filter: true)
  end
...

Fwiw, I think it’d be nice if it also auto-sanitized ilike queries.

dimamik · June 13, 2025, 8:55pm

Hey! Thanks a lot! So we’ve been using it this way:

  def search(query, term) when length(term) < 3 do
    term = Torus.sanitize(term)

    Torus.ilike(query, [u], [u.username], term <> "%")
  end

  def search(query, term) do
    Torus.similarity(query, [u], [u.username], term)
  end

But yes, there are plans to make it composable so that you can use different search types in a single query!
And regarding the sanitization, I’ve opted to expose Torus.sanitize/1 instead of embedding it into the term itself. Probably should improve the visibility of it though

jam · June 13, 2025, 10:45pm

there are plans to make it composable

Nice! And thanks for pointing me to the sanitize function.

One other quick question: where should I set this SET pg_trgm.word_similarity_threshold = 0.3?

dimamik · June 14, 2025, 10:09am

There are a few options, either in postgresql.conf or per-database in a migration:

defmodule YourApp.Repo.Migrations.SetWordSimilarityThreshold do
  use Ecto.Migration

  @database Application.compile_env!(:your_app_name, YourApp.Repo)[:database]

  def up do
    execute("""
    ALTER DATABASE #{@database} SET pg_trgm.word_similarity_threshold = 0.5;
    """)
  end

  def down do
    execute("""
    ALTER DATABASE #{@database} RESET pg_trgm.word_similarity_threshold;
    """)
  end
end

dimamik · June 19, 2025, 12:25pm

Torus v0.5.2 is released!

New

New demo page where you can explore different search types and their options. It also includes semantic search, so if you’re hesitant - go check it out!
Other documentation improvements

Fixes

Correctly handles order: :none in Torus.semantic/5 search.
Updates Torus.Embeddings.HuggingFace to point to the updated feature extraction endpoint.
Suppresses warnings for missing ecto_sql dependency by adding it to the required dependencies. Most of us already had it, but now it’ll be explicit.
Correctly parses an array of integers in Torus.QueryInspector.substituted_sql/3 and Torus.QueryInspector.tap_substituted_sql/3. Now we should be able to handle all possible query variations.