I’m using levenshtein_less_equal to do a fuzzy search for some tables. It’s working well, but I’d like to also return the levenshtein distance (using a select_merge?) so that I can compare the results against a different table’s results.
I think you should be able to use just the levenshtein function (not the _less_equal version) to return the distance as a number. Then filter in a where clause with the distance < @threshold, then join/sort whatever you need. Does that make sense?
Thanks @kip. I’m using levenshtein_less_equal for efficiency reasons, but I’ll simplify things for a second because I realize it’s not important to my question. Do you mean something like this? Do I have to compute the levenshtein three separate times (where, order, and select)?
(Ecto.QueryError) field `distance` in `where` does not exist in schema MyApp.Book in query:
from b0 in MyApp.Book,
where: b0.distance < 5,
order_by: [asc: b0.distance],
select: merge(b0, %{distance: fragment("levenshtein(?, ?)", b0.slug, ^...)})
Ah, yes, not possible to reuse an expression from a SELECT clause in a WHERE clause. This because in SQL the order of evaluation is join -> projection -> selection -> order and therefore calculated values in a SELECT are not available to the prior projection phase.
That said, Postgres does a good job of optimising multiple function calls with the same arguments to one call only. Per the documentation:
An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever. This category allows the optimizer to pre-evaluate the function when a query calls it with constant arguments. For example, a query like SELECT ... WHERE x = 2 + 2 can be simplified on sight to SELECT ... WHERE x = 4, because the function underlying the integer addition operator is marked IMMUTABLE.
And for the levenshtein function we can see it is defined as:
ecto_playground=# select pg_get_functiondef('levenshtein(text,text)'::regprocedure);
pg_get_functiondef
-------------------------------------------------------------
CREATE OR REPLACE FUNCTION public.levenshtein(text, text) +
RETURNS integer +
LANGUAGE c +
IMMUTABLE PARALLEL SAFE STRICT +
AS '$libdir/fuzzystrmatch', $function$levenshtein$function$+
(1 row)
Therefore I would go with your original query (in the first post of the thread). Sorry for the distraction.
Okay, that makes sense. I didn’t know about the order of evaluation or the optimization of multiple function calls with the same arguments. That’s very helpful—thanks!