Nested Ecto Fragments to Generate Fuzzy Match Query

I need some help understanding the best way to extend an Ecto query to fuzzy search one of the postgresql tables in our system.

Today I have the following module which uses a CASE statement to produce a “score” for each result, I order by the score and limit to the number of results I need.

defmodule FuzzyProductSearch do
  import Ecto.Query

  def query(term, opts \\ []) when is_binary(term) do
    limit = opts[:limit] || 10

    query =
      from p in Product,
        select: %{
          p
          | match_score:
              fragment(
                """
                CASE
                  -- Exact matches
                  WHEN sku = ? THEN 1.0
                  WHEN title = ? THEN 0.9

                  -- Fuzzy matches
                  WHEN word_similarity(?, sku) > 0.3
                      THEN word_similarity(?, sku) * 0.85
                  WHEN word_similarity(?, title) > 0.3
                      THEN word_similarity(?, title) * 0.75

                  -- Contains matches
                  WHEN sku LIKE ? THEN 0.6
                  WHEN title LIKE ? THEN 0.5
                  ELSE 0
                END
                """,
                ^term,
                ^term,
                ^term,
                ^term,
                ^term,
                ^term,
                ^"%#{term}%",
                ^"%#{term}%"
              )
        }

    from q in subquery(query),
      where: q.match_score > 0,
      order_by: [desc: q.match_score, asc: q.sku],
      limit: ^limit
end

This works for a single term but I want to extend it to or a series of terms. One way to do that is to use GREATEST and have one CASE per term. This is where I am stuck.

I have attempted to:

Generate the SQL and insert into a fragment but ran across the issue that fragment cannot take interpolated strings. My implementation looked similar to:

def query(terms, opts \\ []) do
  terms_count = length(terms)
  sql = @case_sql |> List.duplicate(terms_count) |> Enum.join(", ")
  sql = "GREATEST(#{sql})"

  bindings =
    Enum.flat_map(terms, fn term ->
      List.duplicate(term, 6) ++ List.duplicate("%#{term}%", 2)
    end)

  query = from p in Project, select: %{p | match_score: fragment(sql, ^splice(bindings))
end

Generate a bunch of dynamics and pass them into the "GREATEST" fragment but you cannot mix dynamics and fragments.

Short of just slamming this as straight SQL to the database is there another way to think about solving the problem?

Conceptually I think I want:

  • a function that produces the CASE
  • a greatest function that can take multiple of the CASE function

Non-working code:

defmacro greatest(values) do
  fragment("GREATEST(?)", splice(values))
end

defmacro match_score(term) do
  like_term = "%#{term}%"

  quote do
    fragment(sql, term, term, term, ..., like_term)
  end
end

def query(terms) do
  from p in Product, select: %{p | match_score: greatest(for t <- terms, do: match_score(t))}
end

Is this possible?

I came up with the following. It’s commented so, I hope, you get what it’s doing. It’s untested so feel free to make it work and play with it.

def query(terms, opts \\ []) when is_binary(term) do                          
  limit = opts[:limit] || 10                                                  
                                                                              
  # Create a pseudo table from your terms so you can use it in the            
  # following queries.                                                        
  terms_query =                                                               
    from(t in fragment("select * from unnest(?::text[]) as t(value)", ^terms))
                                                                              
  # Create a Cartesian product of your terms and products and                 
  # calculate the match score of each combination.                            
  products_query =                                                            
    from(p in Product, as: :product)                                          
    |> join(:inner, [], t in ^terms_query, on: true, as: :term)               
    |> select_merge([term: t], %{                                             
      match_score:                                                            
        fragment(                                                             
          # Same as in your example, just taken out for brevity.              
          @case_frament,                                                      
          t.value,                                                            
          t.value,                                                            
          t.value,                                                            
          t.value,                                                            
          t.value,                                                            
          t.value,                                                            
          t.value,                                                            
          t.value                                                             
        )                                                                     
    })                                                                        
                                                                              
  # Calculate the row number of the combinations of each product              
  # by the match score in descending order. See window functions.             
  ordered_query =                                                             
    from(subquery(products_query), as: :product)                              
    |> where([product: p], p.match_score > 0)                                 
    |> select_merge([product: p], %{                                          
      row_number:                                                             
        over(row_number(),                                                    
          partition_by: p.id,                                                 
          order_by: {:desc, p.match_score}                                    
        )                                                                     
    })                                                                        
                                                                              
  # Use the ordered products and terms combinations as a filter.              
  from(p in Product, as: :product)                                            
  |> join(:inner, [product: p], rp in subquery(ordered_query),                
    on: p.id == rp.id and rp.row_number == 1,                                 
    as: :rproduct                                                             
  )                                                                           
  # You can otherwise use the product binding.                                
  |> order_by([rproduct: rp], desc: rp.match_score, asc: rp.sku)              
  |> limit(^limit)                                                            
end

Arbitrary length lists don’t go well with compile time validation. splice is a bit of a push towards allowing for those more easily, but they cover only certain usecases.

What I’ve done in the past was mutliple function heads of various arities for fragment macros, where there was a reasonable max number of items and at runtime mapping from length to called macro.

You can also work around the limitation by pushing the arbitrary length list to a place where it doesn’t need to be iterated by the query generator as done by @krasenyp.

1 Like