How Ecto query parentheses can affect a query

nathanl · July 22, 2024, 7:30pm

Today, I learned that where: [x] and [y] in Ecto gets different parentheses than where: [x], where: [y], and that this can affect the query plan in PostgreSQL.

Each where: gets a parentheses, and each condition gets a parentheses if it’s not the only one in its where:.

where: a.foo == ^foo and a.bar == ^bar,
where: a.baz >= ^min_baz or a.buz == ^buz,
where: a.bip == 5.0

produces

WHERE ((l0."foo" = $1) AND (l0."bar" = $2))
AND ((l0."baz" >= $3) OR (l0."buz" = $4))
AND (l0."bip" = 5.0)

Parentheses can matter for correctness when there is a mix of AND and OR conditions, as in:

WHERE (department = 'sales' AND role = 'manager') OR salary > 100000
-- vs
WHERE department = 'sales' (AND role = 'manager' OR salary > 100000

They can also matter for query planning, such as in

WHERE column1 = 'value1' AND (column2 = 'value2' AND column3 = 'value3');

If there’s a compound index on column1 and column2, grouping like this might thwart its usage.

A good rule is: use a separate where: for each condition unless you have a specific reason to group them with an and.

I don’t see this in the docs. Shouldn’t it be? I can make a PR if desired, but I’m not sure whether Ecto.Query or somewhere in ecto_sql is the right place to put it.

nathanl · July 22, 2024, 8:10pm

BTW, to see your raw query SQL, you can do something like:

{query, params} = MyApp.Repo.to_sql(:all, query)
IO.puts query

nathanl · July 22, 2024, 8:57pm

Note: this post actually came out of a real-world problem we were having. From what our db folks have said, the parens are evaluated innermost-first. In our our case, we have time series data where the first thing you want to do is discard everything that isn’t in the relevant time window, but our parens were forcing other conditions to be evaluated on all time ranges first, and only then could it apply the time-based condition.

nathanl · September 3, 2024, 7:29pm

Semi-retraction: I was told by our infrastructure team that parentheses can affect a query in this way and that issue had been observed in prod, but I still haven’t seen the issue reproduced. So I’m skeptical that it actually works this way.

RudManusachi · September 3, 2024, 8:05pm

Huh… I haven’t though of the performance implications before, but always preferred the opposite as it looks more like a regular SQL… while with multiple where I need to remember that those imply and and not or and there is or_where/3 for those situations.

My another argument in favor of explicit AND or OR is the behavior of multiple when guards in regular elixir functions

def foo(x, y)
    when ...
    when ...
    when ... do
  ...
end

there guards are composed with OR

mudasobwa · September 4, 2024, 6:30am

Not only they are composed with or but also they all evaluated even if one raises.

defmodule When do
  def ko when 0/0 or true, do: "💥"
  def ok when 0/0 when true, do: :ok
end

When.ok
:ok

When.ko
** (FunctionClauseError) no function clause matching in When.ko/0

That’s actually the most important difference between several when and or.

dimitarvp · September 4, 2024, 12:25pm

Huh, I can’t seem to understand your example. Why is When.ko() blowing up with this error in particular?

LostKobrakai · September 4, 2024, 12:40pm

Because division by zero is an invalid mathematical operation, therefore the guard fails and no function head could be found to execute.

dimitarvp · September 4, 2024, 1:04pm

Sure, though the FunctionClauseError is not intuitive to me, I’d expect to see the math error itself.

Nevermind that though, and thank you.

Still, why does When.ok() succeed? It too divides zero by zero. Is using several when clauses mean that the first one that returns true (and does not blow up) gets “picked” and thus the function evaluates successfully?

LostKobrakai · September 4, 2024, 1:11pm

Multiple when clauses require just a single one to succeed. It doesn’t matter what happens in any of the non succeeding ones. Contrary to using x or y where failures propagate once they happen.

mudasobwa · September 4, 2024, 1:23pm

Well, not necessarily. when true or raise would pass due to short-curcuit evaluation.

Technically the difference is or/2 raises unlike guard evaluation:

In guards, when functions would normally raise exceptions, they cause the guard to fail instead.

Answering the question stated by @dimitarvp

Still, why does When.ok() succeed?

It succeeds because when A or B is somewhat similar to (in pseudocode) when try(A or B rescue false) while when A when B is when (try A rescue false) or when (try B rescue false).

dimitarvp · September 4, 2024, 3:55pm

That makes sense and explains it better, thank you.

OK, so shouldn’t When.ko() still result in success? If we go by your pseudo-code then it looks like it should because 0/0 gets rescue-d and evaluates to false, which should result in the second guard being evaluated, which is plain true?

mudasobwa · September 4, 2024, 4:26pm

No. ko/0 fails to find a clause because there is the only guard which returns false (due to that its content raises.)