How to properly sanitize multiple ts_vector fields with reusable macro?

I’ve crawled couple related topics but still can’t implement reusable module for full-text search. I’m stuck with this module and can’t understand why it can’t find any row even when it has to. Looks like there is interpolation problem with creating multiple ts_vector but I’m not sure.

My goals:

  1. to abstract later search for other entities and have this api API
  2. to speed up this search with creating indexes while it has some joins (is it possible?) And is it normal to search within related tables?
  3. Consider to add tsv field with triggering updates and creating indexes and having to_tsvector value in db.
defmodule MyApp.Performer.Search do
  import Ecto.Query

  @fields ~w(bio first_name last_name phone email city)
  @nullable_fields ~w(name)

  defmacro tsquery(fields, terms, language \\ "english") do
    quote do
      fragment(
        "to_tsvector(?, ?) @@ to_tsquery(?, ?)",
        unquote(language),
        unquote(fields),
        unquote(language),
        unquote(terms)
      )
    end
  end

  defmacro tsrankcd(fields, terms, language \\ "english") do
    quote do
      fragment(
        "ts_rank_cd(to_tsvector(?, ?), to_tsquery(?, ?))",
        unquote(language),
        unquote(fields),
        unquote(language),
        unquote(terms)
      )
    end
  end

  def run(query, search_term) do
    search_term = normalize(search_term)
    fields = fields_to_tsvector_fields(@fields, @nullable_fields)

    queryable =
      from q in query,
        join: p in assoc(q, :profile),
        left_join: t in assoc(q, :topics),
        where: tsquery(^fields, ^search_term),
        order_by: [
          desc: tsrankcd(^fields, ^search_term)
        ],
        distinct: [q.id]
  end

  defp normalize(terms, operator \\ "&") do
    terms
    |> String.downcase()
    |> String.trim()
    |> String.replace(~r/\s+/u, " #{operator} ")
    |> (&(&1 <> ":*")).()
  end

  def fields_to_tsvector_fields(non_nullable_fields, nullable_fields) do
    nullable_fields
    |> Enum.map(fn field -> "coalesce(#{field}, ' ')" end)
    |> (&(non_nullable_fields ++ &1)).()
    |> Enum.join(" || ' ' || ")
  end
end

Would highly appreciate any advice or suggestion, thanks.

Could you write few examples of SQL that you want to generate?

Ecto does not interpolate any values at all, ever. Rather it uses SQL parameters to avoid SQL injection. If you turn on debug logging you should be able to see the query that Ecto runs. Try running that in PSQL and tweaking the values until you get what you want, then adjust your ecto query accordingly.

1 Like

Hi @vadimshvetsov! At work, we have implemented a small lib that helps us abstract what we need for full-text search. Take a look at our repo and see if it helps: https://github.com/raidcorp/searchy.

1 Like

Hey, thank you all.

SELECT DISTINCT ON (p0."id") p0."id", p0."bio", p0."cv", p0."is_onboarding_finished", p0."user_id", p0."native_language_id", p0."status_id", p0."inserted_at", p0."updated_at" 
FROM "performers" AS p0 
INNER JOIN "users" AS u1 ON u1."id" = p0."user_id" 
LEFT OUTER JOIN "performer_topics" AS p3 ON p3."performer_id" = p0."id" 
LEFT OUTER JOIN "topics" AS t2 ON p3."topic_id" = t2."id" 
WHERE (to_tsvector('english', $1) @@ to_tsquery('english', $2)) ORDER BY p0."id", ts_rank_cd(to_tsvector('english', $3), to_tsquery('english', $4)) DESC 
["bio || ' ' || first_name || ' ' || last_name || ' ' || phone || ' ' || email || ' ' || city || ' ' || coalesce(name, ' ')", "darth:*", "bio || ' ' || first_name || ' ' || last_name || ' ' || phone || ' ' || email || ' ' || city || ' ' || coalesce(name, ' ')", "darth:*"]

So this is the request. As I see it’s kinda hard to implement full text search in db query, maybe better to split them on query per table? I want to add tsvector field later for speed up things and it’s table specific.

Ecto does not interpolate any values at all, ever

Yep, I mean that mistake somewhere in my code, maybe because I don’t unquote variables when building to_tsvector value? Raw query works perfectly when I move SQL parameters to SQL query.

Hi @vadimshvetsov! At work, we have implemented a small lib that helps us abstract what we need for full-text search. Take a look at our repo and see if it helps: https://github.com/raidcorp/searchy .

Wow, that’s pretty solid, I will dig in. Is it possible to search things on many to many joins by related entity fields for example with this APIs?

I don’t know if I understood exactly what you need but as long as you have a ts_vector field generated for the table you want to search, I guess you are good to go. Our use-case is very simplistic and the lib itself is just a thin wrapper that translates the queries according to the postgres docs. If this is not exactly what you need or it’s incomplete, feel free to open a discussion on the repo so we can talk about it :blush:.

1 Like

I’m wondering how to join search results of parent entity with joins and search inside join tables too.

So let’s say I have:

User | id | name | bio | ts_vector <- many-to-many -> Topic | id | title | ts_vector

I want to search by user.name, user.bio and joined topics.title of this user.

Is the best way look like this?

from u in User,
  left_join: t in assoc(u, :topics),
  where: tsquery(u.ts_vector, ^search_term) or tsquery(u.ts_vector, ^search_term),
  distinct: [t.id]

I think it depends on how much complexity you want to put on the search optimization.
The query you posted might work, or you could create a materialized view with your tsvectors.
As most things, I would benchmark it before commiting to a more complex solution.

1 Like

I think you can avoid all that complexity with fields_to_tsvector_fields function, @fields and @nullable_fields just by using concat_ws.

defmacro concat_ws(first_value, second_value, separator) do
  quote do
    fragment("concat_ws(?, ?, ?)", unquote(separator), unquote(first_value), unquote(second_value))
  end
end

separator was moved to the last argument to have an ability to compose concat_ws calls via pipe operator

ts_query(
  p.bio
  |> concat_ws(p.first_name, " ")
  |> concat_ws(p.last_name, " ")
  |> concat_ws(p.phone, " ")
  |> concat_ws(p.email, " ")
  |> concat_ws(p.city, " ")
  |> concat_ws(t.name, " "),
  ^search_term
)
1 Like

Hello again @vadimshvetsov! Today at work we had to work with a similar problem.
Our use-case was searching in a many-to-many relationship that also has other filters specific to the table, and we solve it like this:


def search_by(term, filters) do
  queryable = 
    from a in A, 
      join b in B, as: :search_table, 
      on: a.id == b.id,
      where: ^filters
    
    term = to_tsquery(:search_tsvector, #"{term}:*")

    Repo.search(queryable, term)
end

def to_tsquery(search_field, search_term) do
  dynamic([search_table: t], 
    fragment("? @@ to_tsquery(?)", 
      field(t, ^search_field), ^search_term))
end

We had to use a named binding because the to_tsquery function had no clue of what binding was supposed to be applied since it’s positional. This allowed us to change which table we wanted to search in more complex queries.

So, even though our lib does not support this specific use-case (as we noticed today), I hope to add this feature soon after we test the use-cases properly. In the meantime, I think you could build the query manually using something similar.

PS.: Bear in mind that we just use full-text search to simple scenario like searching one table at a time. If you care about performance, you might wanna look into this thread: Full text search over multiple related tables: indices and performance.

1 Like

Thanks a lot for helping me.

I’ve starred https://github.com/raidcorp/searchy and will keep an eye on it. The reason why I’ve done this on my own is because I need to rank results and search in related tables.

So I’ve ended up with this one:

At first I’ve added pg_term extension for incomplete word search:

defmodule MyApp.Repo.Migrations.AddPgTrgmExtension do
  @moduledoc """
  Create postgres pg_trgm extension and indices
  """

  use Ecto.Migration

  def up do
    execute("CREATE EXTENSION pg_trgm")
  end

  def down do
    execute("DROP EXTENSION pg_trgm")
  end
end

Then migrated searchable tables and add tsvector field, trigger and index:

defmodule MyApp.Repo.Migrations.AddPerformersSearch do
  use Ecto.Migration

  def up do
    alter table("performers") do
      add :tsvector, :tsvector
    end

    create index(:performers, [:tsvector], using: "GIN")

    execute("""
    CREATE OR REPLACE FUNCTION performers_tsvector_trigger()
    RETURNS trigger AS $$
    begin
      new.tsvector := setweight(to_tsvector('russian', coalesce(new.bio, '')), 'A');

      return new;
    end
    $$ LANGUAGE plpgsql;
    """)

    execute("""
    CREATE TRIGGER performers_tsvector_update
    BEFORE INSERT OR UPDATE ON performers
    FOR EACH ROW EXECUTE PROCEDURE performers_tsvector_trigger();
    """)

    alter table("users") do
      add :tsvector, :tsvector
    end

    create index(:users, [:tsvector], using: "GIN")

    execute("""
    CREATE OR REPLACE FUNCTION users_tsvector_trigger()
    RETURNS trigger AS $$
    begin
      new.tsvector := setweight(to_tsvector('russian', coalesce(new.last_name, '')), 'A') ||
      setweight(to_tsvector('russian', coalesce(new.first_name, '')), 'B') ||
      setweight(to_tsvector('russian', coalesce(new.email, '')), 'C') ||
      setweight(to_tsvector('russian', coalesce(new.phone, '')), 'C') ||
      setweight(to_tsvector('russian', coalesce(new.city, '')), 'D');

      return new;
    end
    $$ LANGUAGE plpgsql;
    """)

    execute("""
    CREATE TRIGGER users_tsvector_update
    BEFORE INSERT OR UPDATE ON users
    FOR EACH ROW EXECUTE PROCEDURE users_tsvector_trigger();
    """)

    alter table("topics") do
      add :tsvector, :tsvector
    end

    create index(:topics, [:tsvector], using: "GIN")

    execute("""
    CREATE OR REPLACE FUNCTION topics_tsvector_trigger()
    RETURNS trigger AS $$
    begin
      new.tsvector := setweight(to_tsvector('russian', coalesce(new.name, '')), 'A');

      return new;
    end
    $$ LANGUAGE plpgsql;
    """)

    execute("""
    CREATE TRIGGER topics_tsvector_update
    BEFORE INSERT OR UPDATE ON topics
    FOR EACH ROW EXECUTE PROCEDURE topics_tsvector_trigger();
    """)
  end

  def down do
    execute("DROP TRIGGER performers_tsvector_update on performers;")
    execute("DROP FUNCTION performers_tsvector_trigger();")

    alter table("performers") do
      remove :tsvector
    end

    drop index("performers", [:tsvector])

    execute("DROP TRIGGER users_tsvector_update on performers;")
    execute("DROP FUNCTION users_tsvector_trigger();")

    alter table("users") do
      remove :tsvector
    end

    drop index("users", [:tsvector])

    execute("DROP TRIGGER topics_tsvector_update on performers;")
    execute("DROP FUNCTION topics_tsvector_trigger();")

    alter table("topics") do
      remove :tsvector
    end

    drop index("topics", [:tsvector])
  end
end

Added tsvector type taken from https://github.com/raidcorp/searchy source:

defmodule MyApp.Ecto.Types.TSVector do
  use Ecto.Type

  def type, do: :tsvector

  def cast(tsvector), do: {:ok, tsvector}

  def load(tsvector), do: {:ok, tsvector}

  def dump(tsvector), do: {:ok, tsvector}

  def embed_as(_), do: :self

  def equal?(term1, term2), do: term1 == term2
end

Added tsvector field to all searchable ecto schemas:

    field :tsvector, Proling.Ecto.Types.TSVector

Added Search.Helpers module for convience:

defmodule MyApp.Search.Helpers do
  defmacro tsquery(tsvector, terms, language \\ "russian") do
    quote do
      fragment(
        "? @@ to_tsquery(?, ?)",
        unquote(tsvector),
        unquote(language),
        unquote(terms)
      )
    end
  end

  defmacro tsrankcd(tsvector, terms, language \\ "russian") do
    quote do
      fragment(
        "ts_rank_cd(?, to_tsquery(?, ?))",
        unquote(tsvector),
        unquote(language),
        unquote(terms)
      )
    end
  end
end

And finally added API for using in context:

defmodule MyApp.Production.Performer.Search do
  import Ecto.Query
  import MyApp.Search.Helpers

  @spec search(Ecto.Query.t(), any()) :: Ecto.Query.t()
  def search(query, search_term) do
    search_term = normalize(search_term)

    from q in query,
      join: p in assoc(q, :profile),
      left_join: t in assoc(q, :topics),
      where:
        tsquery(p.tsvector, ^search_term) or tsquery(q.tsvector, ^search_term) or
          tsquery(t.tsvector, ^search_term),
      order_by: [
        desc: tsrankcd(p.tsvector, ^search_term),
        desc: tsrankcd(q.tsvector, ^search_term),
        desc: tsrankcd(t.tsvector, ^search_term)
      ],
      distinct: [q.id]
  end

  defp normalize(terms, operator \\ "&") do
    terms
    |> String.downcase()
    |> String.trim()
    |> String.replace(~r/\(|\)\[|\]\{|\}/u, "")
    |> String.replace(~r/\s+/u, " #{operator} ")
    |> (&(&1 <> ":*")).()
  end
end

I’m gonna star @thiagomajesk answer with searchy because it`s source code greatly led me to the final destination. Also I would like to mention this awesome answer - Search Bar Feature (Phoenix Searching)

Hope this step-by-step example could help someone to get things done.

3 Likes