Please can you help me in choosing a good database model to facilitate search queries using PostgreSQL full text search?

Kurisu · February 17, 2020, 5:47am

Let’s say I have 3 tables: users, profiles and skills.
Resulting in following associations:

User has_one Profile (:profile) and Profile belongs_to User (:user).
User has_many Skill (:skills) and Skill belongs_to User (:user).

My search feature will show users whith user.name, or profile.introduction, or having one or more skills matching the search term.

My problem here is the one-to-many realtionship between users and skills.

I’m thinking to create a denormaized table or a view just for this search feature.
For example skill_documents with user_id and tsv column (to store the concatenation of all skills names of a given user) . This way everytime an user adds, removes or updates a skill the tsv column will be updated accordingly.

Please do you think this is an acceptable solution or what would you suggest me instead?

Thanks

Edit: I’m not finding the general programming category so I put my question in the members only one.

Edit 2: So as suggested by @dimitarvp I moved my question to Elixir category and added Ecto tags. There is no Ecto dedicated category apparently.

dimitarvp · February 17, 2020, 11:34am

This is a perfectly valid question for the Ecto category btw.

Kurisu · February 18, 2020, 11:39am

Finally I ended up with a tsv column in each table involved in the search query. I just keep the initial tables: users, profiles and skills. My SQL query does an inner join from users to profiles, and a left join to skills. Now as a user can have multiple skills, the results at this point will contain duplicate users. To fix that, I added distinct clause to the query on user id, while ordering by relevance of matching.

My query ultimately looks like this:

  def search_users(term, params) do
    User
    |> join(:inner, [u], p in assoc(u, :profile))
    |> join(:left, [u], s in assoc(u, :skills))
    |> search_where(term)
    |> order(term)
    |> distinct([u], u.id)
    |> preload([u, p, _], profile: p)
    |> Repo.paginate(params)
  end

  defp search_where(query, term) do
    where(
      query,
      [u, p, s],
      fragment("? @@ ?", u.tsv, plainto_tsquery(^term)) or
        fragment("? @@ ?", p.tsv, plainto_tsquery(^term)) or
        fragment("? @@ ?", s.tsv, plainto_tsquery(^term))
    )
  end

  defp order(query, term) do
    order_by(query, [u, p, s],
      desc: ts_rank_cd(u.tsv, plainto_tsquery(^term)),
      desc: ts_rank(p.tsv, plainto_tsquery(^term)),
      desc: ts_rank(s.tsv, plainto_tsquery(^term))
    )
  end

Feel free to make any comments on this.

Thanks

Brainiac · February 18, 2020, 4:14pm

Instead of duplicating the columns through concatenation, why not use a trigram index for the columns that you need?

Also another issue I had when using postgresql’s full text search, was the minimum characters required in the search query.

I ended up using trigram indexes instead for my text searching. Though ymmv depending on length of text that you are searching. For short words like in your skills table, a trigram index might work better

Kurisu · February 18, 2020, 6:52pm

In my final version of the query I don’t concatenate the skills but just compare each skill with the search term by doing a left join from users to skills. I detailed my method in my previous reply.

That’s something I didn’t know. I will explore this solution. Thanks

Well I forgot to mention that my search function has a first clause for empty search term.

def search_users("" = _term, _params), do: []

And before I pass the search term to the search_users/2 function I pass it through another function to normalize it so that undesirable search strings result in "".

  @doc """
  normalize search text.
  String whith lesser than `min` characters will be replaced by ""
  `min` defaults to 3
  """
  def normalize(search_string, min \\ 3) when is_integer(min) do
    too_short = if min > 1, do: "{1,#{min - 1}}", else: "{0,0}"

    search_string
    |> String.downcase()
    |> String.replace(~r/\n/, " ")
    |> String.replace(~r/\t/, " ")
    |> String.replace(~r/\s{2,}/, " ")
    |> String.trim()
    |> String.replace(~r/^.#{too_short}$/, "")
  end

entone · February 18, 2020, 7:34pm

I started playing with Sonic recently, and it’s a really nice lightweight full-text search system. It doesn’t have any distributed functionality, but you could potentially run it using a side-car pattern for super-fast full-text search.

You can see my usage here, https://github.com/entone/web_archive_viewer

Kurisu · February 18, 2020, 10:11pm

Looks really awesome. Thanks for sharing. ^^

entone · February 18, 2020, 10:29pm

I’m a big proponent of a polyglot database approach, and choosing the right tool for the job. I think pushing your text data to a purpose built DB allows you to make the right choices with regards to your SQL architecture without sacrificing anything for your search functionality.

Same thing for telemetry data and logs.