Ecto Query to compare 2 tables on a string column and return the unmatched result

My SQL (and consequently Ecto Query abilities) are incredibly rusty - 2 or more years since needing to write anything really useful.

I have 3 tables - Owners, Services and Cars. Services and Cars both belong to Owners. Sadly, Cars and Services aren’t otherwise related EXCEPT by a string field, license_plate.

I am trying to fetch the Cars that don’t have Services. I can easily get the Cars that DO have Services, but for some reason the opposite doesn’t work.

This returns as expected, all Cars that have Services

Repo.all from c in Cars,
      join: o in assoc(c, :owner),
      join: s in assoc(o, :services),
      where: c.license_plate == s.license_plate,
      distinct: c

But using a != returns all Cars :frowning: I’d expect that to return only the Cars that don’t have Services.

I’ve played around with union but, again, I’m rusty.

Would anyone be able to point me in the right direction?

You want a ‘not-exists’ kind of query, which can be written using left_join:

Repo.all from c in Cars,
      join: o in assoc(c, :owner),
      left_join: s in assoc(o, :services),
      on: c.license_plate == s.license_plate,
      where: is_nil(s.id)

The left join produces a “dummy” row services row with all columns null when nothing matches the on conditions, so the license_plate check is moved into the on clause.

The where filters out any results that did have a matching services row, since s.id can only be null in those dummy rows.

2 Likes

If you want to read something in depth Rise of the Anti-Join

And something not really discussed in the article, you want to make sure you’re indexing the column you’re looking at in the service table. Otherwise finding missing entries requires a full table scan.

2 Likes

So close and yet so far I was :smiley: I had a left join there earlier, just didn’t think of the on.

Thanks, this did it! You’re a star.

I didn’t have indexes… and added them after your comments. So now I don’t know if this helped to do the trick :slight_smile: but thank you nonetheless! Maybe someone stumbles upon this and this was the missing link.

Oh it will help with performance but not change the underlying result. If n is the size of the table you’re looking at O(log(n) instead of O(n)

1 Like