Ecto.Query join associativity (order of join operations)

Hello,

I just found a bug in a larger query of mine, which boils down to this simplified example:

(For the sake of the example, assume a Post can have only one, optional, Comment)

Post
|> join(:left, [p], c in assoc(p, :comment)
|> join(:inner, [p, c], u in assoc(c, :user))

The correct version uses left in both join statements.

I believe this query is wrong, because the inner join was written with the idea that a Comment will always have a User, so a join between Comment and User will always produce the same rows, whether an :inner or a :left join. However, at the SQL level it’s a huge difference whether we join

Post <<-> (Comment <-> User)
# or
(Post <<-> Comment) <-> User
# where <<-> shall denote a left join and <-> an inner join

in other words, joins of different types are not associative.

Apparently, in my case, the join pipeline boiled down to first joining Post-Comment, then User, which lead to missing result rows.

The question then is:
At the Ecto level, is this undefined behaviour, or is it just undocumented? (Or, is it documented but I could not find it?)

Ecto.Query uses the joins you specify exactly as specified; if you say join(:inner, ...) you get INNER JOIN ... and so on.

The question was about several, mixed joins.

This is not an Ecto’s behavior but that of your SQL database (Postgres, MySQL, SQLite, etc).

Ecto is responsible for generating SQL, sending it to the database, then casting the response into data structures you’ve specified. It’s not responsible for executing the SQL statements, that’s the responsibility of the database you are connecting to.

If you run the different queries directly on your database (skipping Ecto) you will see you get the same results.

I think I can answer my question at least in part:

from(a in "a")
|> join(:left, [a], b in "b")
|> join(:inner, [a, b], c in "c")
|> select([], 1) # makes Ecto happy
|> Repo.all

produces the following SQL query:

SELECT TRUE FROM "a" AS a0 LEFT OUTER JOIN "b" AS b1 ON TRUE INNER JOIN "c" AS c2 ON TRUE

Note that there are no brackets (and if there were any, that would require an additional FROM)

This means that intuitively, the Ecto query syntax directly matches the SQL syntax, and one just needs to know that in SQL, the joins happen in this order:

(a joins b) joins c

as explained in this SO answer: https://stackoverflow.com/a/12656047

Be mindful as “brackets” at this level are not plain enforcement of precendence, but are means of defining subqueries. Those require a whole different level of understanding sql query tradeoffs.

Perhaps it helps to keep in mind that SQL is a declarative language, you are telling it you want data that conforms to the query, not how it should retrieve the data.