Torus - Integrate PostgreSQL's search into Ecto queries

Torus is a plug-and-play Elixir library that seamlessly integrates PostgreSQL’s search into Ecto, streamlining the construction of advanced search queries.

The goal is to help developers create Ecto queries that return relevant results with optimal performance, and it’s best to show how on an example:

  1. Pattern matching: Searches for a specific pattern in a string.

    iex> insert_posts!(["Wand", "Magic wand", "Owl"])
    ...> Post
    ...> |> Torus.ilike([p], [p.title], "wan%")
    ...> |> select([p], p.title)
    ...> |> Repo.all()
    ["Wand"]
    

    See like/5, ilike/5, and similar_to/5 for more details.

  2. Similarity: Searches for items that are closely alike based on attributes, often using measures like cosine similarity or Euclidean distance. Is great for fuzzy searching and ignoring typos in short texts.

    iex> insert_posts!(["Hogwarts Secrets", "Quidditch Fever", "Hogwart’s Secret"])
    ...> Post
    ...> |> Torus.similarity([p], [p.title], "hoggwarrds")
    ...> |> limit(2)
    ...> |> select([p], p.title)
    ...> |> Repo.all()
    ["Hogwarts Secrets", "Hogwart’s Secret"]
    

    See similarity/5 for more details.

  3. Text Search Vectors: Uses term-document matrix vectors for full-text search, enabling efficient querying and ranking based on term frequency. - PostgreSQL: Full Text Search. Is great for large datasets to quickly return relevant results.

       iex> insert_post!(title: "Hogwarts Shocker", body: "A spell disrupts the Quidditch Cup.")
       ...> insert_post!(title: "Diagon Bombshell", body: "Secrets uncovered in the heart of Hogwarts.")
       ...> insert_post!(title: "Completely unrelated", body: "No magic here!")
       ...>  Post
       ...> |> Torus.full_text([p], [p.title, p.body], "uncov hogwar")
       ...> |> select([p], p.title)
       ...> |> Repo.all()
       ["Diagon Bombshell"]
    

    See full_text/5 for more details.

The above macros accept a list of options to customize their behavior. See function docs for examples. Most functions have an optimization section that might help you boost the performance of these search queries.

In upcoming plans, we’ll add support for highlighting search results and extend the search with semantic and hybrid search. Please let me know what do you think of it, and I’ll gladly hear any suggestions on how to make it better!

Links

16 Likes

Looks interesting! I saw that all the examples were using “function-style queries.” Does Torus support “macro-style” queries? If not, are you planning to add support for them?

1 Like

Yes, you can also use keyword based queries! It’s not as native as other keywords, since afaik you can’t expand custom macros in from clause, but you can do something like this:

from(p in Post, select: p.title)
|> Torus.like([p], p.title, "pinned%")
|> Repo.all()
1 Like

What is the difference between the ecto like/ilike and the one provided by your library?

3 Likes

Great question! Torus.like/5 and Torus.ilike/5 are using Ecto’s like/2 and ilike/2 under the hood, but they allow us to search across multiple columns (just like the rest of the search functions). The main reason I’ve added them was to cover more of the search types so we won’t need to switch contexts and reach for a different tool.

So summing up, for two reasons:

  1. Torus versions allow us to search across multiple columns
  2. Torus docs and API are more complete with these two similarity options
1 Like

One (somewhat) lesser known detail about these LIKE operators is that you have to be careful to escape the match characters from the user input because it is possible to craft pathological matches/regexes and use them as a dos vector.

It would be nice if there were a way to do that automatically, perhaps at compile-time with a sigil/macro.

1 Like

Just to make sure we’re on the same page, do you mean LIKE injections? If yes, then I was considering adding an option to sanitize the input term, but I thought that adding a warning to the docs and delegating this to the caller might be the best/most customizable option. Is this what you meant?

Does/can it support pgroonga ?

Yes, this is what I meant. It’s funny - I was looking for this exact blog post to use as a reference and I couldn’t find it!

I’m not sure the best path, but you could potentially do something like a sigil where the interpolated strings are escaped. Either way a warning in the docs is a good idea.

1 Like

It’s not currently, but I think it might? Can you elaborate more on how you’d see this?

I did not give much thought to this yet. Probably it could be an extension that utilises pgroonga’s search operators