Today, I learned that where: [x] and [y] in Ecto gets different parentheses than where: [x], where: [y], and that this can affect the query plan in PostgreSQL.
Each where: gets a parentheses, and each condition gets a parentheses if it’s not the only one in its where:.
where: a.foo == ^foo and a.bar == ^bar,
where: a.baz >= ^min_baz or a.buz == ^buz,
where: a.bip == 5.0
produces
WHERE ((l0."foo" = $1) AND (l0."bar" = $2))
AND ((l0."baz" >= $3) OR (l0."buz" = $4))
AND (l0."bip" = 5.0)
Parentheses can matter for correctness when there is a mix of AND and OR conditions, as in:
WHERE (department = 'sales' AND role = 'manager') OR salary > 100000
-- vs
WHERE department = 'sales' (AND role = 'manager' OR salary > 100000
They can also matter for query planning, such as in
WHERE column1 = 'value1' AND (column2 = 'value2' AND column3 = 'value3');
If there’s a compound index on column1 and column2, grouping like this might thwart its usage.
A good rule is: use a separate where: for each condition unless you have a specific reason to group them with an and.
I don’t see this in the docs. Shouldn’t it be? I can make a PR if desired, but I’m not sure whether Ecto.Query or somewhere in ecto_sql is the right place to put it.
Note: this post actually came out of a real-world problem we were having. From what our db folks have said, the parens are evaluated innermost-first. In our our case, we have time series data where the first thing you want to do is discard everything that isn’t in the relevant time window, but our parens were forcing other conditions to be evaluated on all time ranges first, and only then could it apply the time-based condition.
Semi-retraction: I was told by our infrastructure team that parentheses can affect a query in this way and that issue had been observed in prod, but I still haven’t seen the issue reproduced. So I’m skeptical that it actually works this way.
Huh… I haven’t though of the performance implications before, but always preferred the opposite as it looks more like a regular SQL… while with multiple where I need to remember that those imply and and not or and there is or_where/3 for those situations.
My another argument in favor of explicit AND or OR is the behavior of multiple when guards in regular elixir functions
def foo(x, y)
when ...
when ...
when ... do
...
end
Not only they are composed with or but also they all evaluated even if one raises.
defmodule When do
def ko when 0/0 or true, do: "💥"
def ok when 0/0 when true, do: :ok
end
When.ok
:ok
When.ko
** (FunctionClauseError) no function clause matching in When.ko/0
That’s actually the most important difference between several when and or.
Sure, though the FunctionClauseError is not intuitive to me, I’d expect to see the math error itself.
Nevermind that though, and thank you.
Still, why does When.ok() succeed? It too divides zero by zero. Is using several when clauses mean that the first one that returns true (and does not blow up) gets “picked” and thus the function evaluates successfully?
Multiple when clauses require just a single one to succeed. It doesn’t matter what happens in any of the non succeeding ones. Contrary to using x or y where failures propagate once they happen.
It succeeds because when A or B is somewhat similar to (in pseudocode) when try(A or B rescue false) while when A when B is when (try A rescue false) or when (try B rescue false).
That makes sense and explains it better, thank you.
OK, so shouldn’t When.ko() still result in success? If we go by your pseudo-code then it looks like it should because 0/0 gets rescue-d and evaluates to false, which should result in the second guard being evaluated, which is plain true?