Ecto, DISTINCT and why does it add ORDER BY <distinct field>

braindeaf · November 12, 2022, 12:10am

Hi there,

I’m currently working through the Craft GraphQL … book, which has been a nice journey through playing with both GraphQL and Ecto. I’ve followed the examples in the book but I’m actually applying them to a new project. So suppose I have a collection of Artists and each can have multiple Genres.

I’m building a query to filter a list of artists by genre. But as with any table that has a join you can end up with duplicate records. e,g,

SELECT a0."id", a0."name", a0."inserted_at", a0."updated_at" FROM "artists" AS a0 INNER JOIN "artists_genres" AS a2 ON a2."artist_id" = a0."id" INNER JOIN "genres" AS g1 ON a2."genre_id" = g1."id" WHERE (g1."name" ILIKE $1) ORDER BY a0."name" DESC ["%core%"]

Easy I’ll just throw in a DISTINCT and we’re good?

SELECT DISTINCT ON (a0."id") a0."id", a0."name", a0."inserted_at", a0."updated_at" FROM "artists" AS a0 INNER JOIN "artists_genres" AS a2 ON a2."artist_id" = a0."id" INNER JOIN "genres" AS g1 ON a2."genre_id" = g1."id" WHERE (g1."name" ILIKE $1) ORDER BY a0."id", a0."name" DESC ["%r%"]

Well seems not. The DISTINCT also has an ORDER BY a0.“id” added as well. I can’t work out why and if this is the default behaviours how can I then order my results by the name only.

  def list_artists(args) do
    # query = Shirts.Core.Artist
    # query = from q in Shirts.Core.Artist, distinct: :id
    query = Shirts.Core.Artist |> distinct([a], a.id)
    args
    |> Enum.reduce(query, fn
      {_, nil}, query ->
        query
      {:order, order}, query ->
        from q in query, order_by: {^order, :name}
      {:matching, name}, query ->
        from q in query, where: ilike(q.name, ^"%#{name}%")
      {:filter, filter}, query ->
        query |> filter_with(filter)
    end)
    |> Shirts.Repo.all
  end
  
  def filter_with(query, filter) do
    Enum.reduce(filter, query, fn
      {:name, name}, query ->
        from q in query, where: ilike(q.name, ^"%#{name}")
      {:genre, genre_name}, query ->
        from q in query,
          join: g in assoc(q, :genres),
          where: ilike(g.name, ^"%#{genre_name}%")
    end)
  end

Any help would be greatly appreciated. Many thanks.

RobL

zachallaun · November 12, 2022, 12:14am

This is a requirement of Postgres’ DISTINCT ON: docs

I believe there are a handful of solutions to this, for instance selecting the distinct records in a subquery and then ordering that resulting set however you want.

One gotcha: using the method above, you still need to secondarily sort the subquery by the field you ultimately want to sort by, otherwise the subquery will return the id of whichever (arbitrary) record is first in the group. So if you want a result sorted by b and distinct on a, you’d want your subquery to order by a, b and then the outer query to order by b.

To use a somewhat common example, imagine I have a table of employees with a department and salary, and I want the highest-paid employee for each department, ordered by salary.

# returns set of highest paid employees per department, but sorted firstly by department
# I don’t think the explicit order by :dept below is necessary, but added for clarity
highest_paid =
  from(e in Employee,
    distinct: :dept,
    order_by: [:dept, desc: :salary])

# resorts the set by salary
from(e in subquery(highest_paid),
  order_by: [desc: :salary])

(writing on phone, above may have errors but hopefully somewhat helpful)

braindeaf · November 12, 2022, 9:08am

Thank you for this. I have limited experience with PostgresSQL, I’ve nearly always used MySQL which doesn’t have the same behaviour. For simplicity then I could use a GROUP BY artists.id because I want write the least code necessary, however I appreciate a GROUP BY would normally be used for aggregate calculations.

Still this has introduced me to how I would use a subquery which is great thank you.

RobL