Ecto Distinct With Order By

I have multiple users who take tests multiple times. I’d like to get a sorted leaderboard for a specific test, where a user who has taken the test multiple times only shows up once, with their best score.

query = 
  from r in TestResult,
  distinct: r.user_id,
  where: r.test_id == ^test_id,
  order_by: [desc: r.score]

However, this ends up sorting by first user_id and then r.score, due to the distinct instruction. So it doesn’t always show their best score. What’s the appropriate way to do this in Ecto?

Am on Postgres 9.6.

Instead of distinct, group by the user ID and order by score, then you should see the correct results.

If I do

query = 
  from r in TestResult,
  where: r.test_id == ^test_id,
  group_by: r.user_id,
  order_by: [desc: r.score]

Then I get an error column "r0.id" must appear in the GROUP BY clause or be used in an aggregate function.

If I include r.id in my group_by, then I’m back to returning multiple results per user.

If you only want the user ID and their score, include select: {r.user_id, r.score} and you’ll get tuples with user ID and score back. (I think that’s the syntax, but you may need to tweak it if I didn’t remember it correctly.)

column "r0.score" must appear in the GROUP BY clause or be used in an aggregate function

If I add r.score to the group_by, I get multiple results again.

Plus I’d like the entire record, not specific fields.

Ah forgive me, you must of course select MAX(r.score) (by using a fragment probably) then and the order_by won’t matter. But that will only return the user ID and the score, true.

Here’s more information if you want the whole row: https://stackoverflow.com/a/7630564/3533441

I think one problem may be that your order_by needs the user_id too when using with distinct (at least that’s why I understand from reading that SO post). Try something like order_by: [r.user_id, desc: r.score] (syntax offhand, may need adjusting).

This is a perfect use case for a window function. You would partition by the user_id and order by the score. Unfortunately there isn’t native support for window functions in Ecto, but there are ways to do it. This article explains a few techniques: http://sorentwo.com/2016/02/08/folding-window-functions-into-ecto.html

Something like this should work :slight_smile:

score_query = 
  from r in TestResult,
  group_by: r.user_id,
  select: %{score: max(r.score), user_id: r.user_id}

query =
  from r in TestResult,
  join: sq in subquery(score_query), on: sq.score == r.score and sq.user_id == r.user_id,
  order_by: [desc: r.score]

Kwando that works great except in the case when a user has the exact same score more than once!

What should happen then?

I’d say if a user has multiple results with the same score, pick the latest chronologically.

do you have seqential ids on the tests?

Nope, you can log a test result that you did in the past. There’s a done_at :utc_datetime field.