My SQL (and consequently Ecto Query abilities) are incredibly rusty - 2 or more years since needing to write anything really useful.
I have 3 tables - Owners, Services and Cars. Services and Cars both belong to Owners. Sadly, Cars and Services aren’t otherwise related EXCEPT by a string field, license_plate.
I am trying to fetch the Cars that don’t have Services. I can easily get the Cars that DO have Services, but for some reason the opposite doesn’t work.
This returns as expected, all Cars that have Services
Repo.all from c in Cars,
join: o in assoc(c, :owner),
join: s in assoc(o, :services),
where: c.license_plate == s.license_plate,
distinct: c
But using a != returns all Cars I’d expect that to return only the Cars that don’t have Services.
I’ve played around with union
but, again, I’m rusty.
Would anyone be able to point me in the right direction?
You want a ‘not-exists’ kind of query, which can be written using left_join
:
Repo.all from c in Cars,
join: o in assoc(c, :owner),
left_join: s in assoc(o, :services),
on: c.license_plate == s.license_plate,
where: is_nil(s.id)
The left join produces a “dummy” row services
row with all columns null when nothing matches the on
conditions, so the license_plate
check is moved into the on
clause.
The where
filters out any results that did have a matching services
row, since s.id
can only be null in those dummy rows.
2 Likes
If you want to read something in depth Rise of the Anti-Join
And something not really discussed in the article, you want to make sure you’re indexing the column you’re looking at in the service table. Otherwise finding missing entries requires a full table scan.
2 Likes
So close and yet so far I was I had a left join there earlier, just didn’t think of the on
.
Thanks, this did it! You’re a star.
I didn’t have indexes… and added them after your comments. So now I don’t know if this helped to do the trick but thank you nonetheless! Maybe someone stumbles upon this and this was the missing link.
Oh it will help with performance but not change the underlying result. If n
is the size of the table you’re looking at O(log(n)
instead of O(n)
1 Like