I need some help understanding the best way to extend an Ecto query to fuzzy search one of the postgresql tables in our system.
Today I have the following module which uses a CASE
statement to produce a “score” for each result, I order by the score and limit to the number of results I need.
defmodule FuzzyProductSearch do
import Ecto.Query
def query(term, opts \\ []) when is_binary(term) do
limit = opts[:limit] || 10
query =
from p in Product,
select: %{
p
| match_score:
fragment(
"""
CASE
-- Exact matches
WHEN sku = ? THEN 1.0
WHEN title = ? THEN 0.9
-- Fuzzy matches
WHEN word_similarity(?, sku) > 0.3
THEN word_similarity(?, sku) * 0.85
WHEN word_similarity(?, title) > 0.3
THEN word_similarity(?, title) * 0.75
-- Contains matches
WHEN sku LIKE ? THEN 0.6
WHEN title LIKE ? THEN 0.5
ELSE 0
END
""",
^term,
^term,
^term,
^term,
^term,
^term,
^"%#{term}%",
^"%#{term}%"
)
}
from q in subquery(query),
where: q.match_score > 0,
order_by: [desc: q.match_score, asc: q.sku],
limit: ^limit
end
This works for a single term
but I want to extend it to or
a series of terms. One way to do that is to use GREATEST
and have one CASE
per term. This is where I am stuck.
I have attempted to:
Generate the SQL and insert into a fragment but ran across the issue that fragment
cannot take interpolated strings. My implementation looked similar to:
def query(terms, opts \\ []) do
terms_count = length(terms)
sql = @case_sql |> List.duplicate(terms_count) |> Enum.join(", ")
sql = "GREATEST(#{sql})"
bindings =
Enum.flat_map(terms, fn term ->
List.duplicate(term, 6) ++ List.duplicate("%#{term}%", 2)
end)
query = from p in Project, select: %{p | match_score: fragment(sql, ^splice(bindings))
end
Generate a bunch of dynamics and pass them into the "GREATEST"
fragment but you cannot mix dynamics and fragments.
Short of just slamming this as straight SQL to the database is there another way to think about solving the problem?
Conceptually I think I want:
- a function that produces the
CASE
- a greatest function that can take multiple of the
CASE
function
Non-working code:
defmacro greatest(values) do
fragment("GREATEST(?)", splice(values))
end
defmacro match_score(term) do
like_term = "%#{term}%"
quote do
fragment(sql, term, term, term, ..., like_term)
end
end
def query(terms) do
from p in Product, select: %{p | match_score: greatest(for t <- terms, do: match_score(t))}
end
Is this possible?