Ecto order_by query with distinct is not giving correct results

I have the below query with multiple left joins and preloads and it’s giving duplicate results if I don’t use the distinct, but by using the distinct, the order_by clause is not working. Below is the query. The query always returns the results in the same order no matter what I put inside order_by.

 Conversation
|> join(:left, [c], p in Participation,
  on: p.conversation_id == c.id and p.user_id == ^visitor_id
)
|> join(:left, [c], m in Message, on: c.id == m.conversation_id)
|> where([_c, p], not is_nil(p.id))
|> preload([_, _, _], creator: :avatar, participants: :avatar)
|> distinct([c, _p], c.id)
|> order_by([c, _p], c.updated_at)
|> select([c, _, m], %{c | last_message: m})

Here is the query it produces.

 #Ecto.Query<from c0 in Postgres.PostgresConversationRepository.Conversation,
 left_join: p1 in Postgres.PostgresConversationRepository.Participation,
 on: p1.conversation_id == c0.id and p1.user_id == ^"c23b2640-413a-4823-ab97- 
 49eccfe3dcfd",
 left_join: m2 in Postgres.PostgresMessageRepository.Message,
 on: c0.id == m2.conversation_id, where: not(is_nil(p1.id)),
 order_by: [asc: c0.updated_at], limit: ^20, distinct: [asc: c0.id],
 select: %{c0 | last_message: m2},
 preload: [participants: [:avatar], creator: [:avatar]]>

How to make order_by work with the above query.
Any help will be much appreciated.
Thank you.

I believe what you are seeing is that the order by clause is being used to select which distinct record to use. Ie the order is to sort the duplicate rows of c0.id by c0.updated_at and select the first one as the distinct one.

You might use your current query as a CTE and then do a select … order by, look at group by instead of distinct or look a using a join on a sub query to get the behaviour you are after.

1 Like

Hi:
Thanks for your suggestion, I am not well versed with the CTE queries, but I gave it a try. and end up with this query.

conversations_query =  visitor_id
|> join_participations_messages()
|> where([_c, p], not is_nil(p.id))
|> preload([_, _, _], creator: :avatar, participants: :avatar)
|> distinct([c, _p], c.id)

Conversation
|> recursive_ctes(true)
|> with_cte("conversation_query", as: ^conversations_query)

After this I got no idea how to combine the CTE with the select and order_by.
Any direction where to go after this?
Thank you.

I think something like this is what you are after (not tested):

Conversation
|> with_cte("conversation_query", as: ^conversations_query)
|> order_by([c], c.updated_at)
|> select([c], %{c | last_message: m})

Basically, treat the cte as you would any other table.

Thanks kip, I have already tried this and it gives me error of unknown binding for m in select.

Conversation
|> with_cte("conversation_query", as: ^conversations_query)
|> order_by([c], c.updated_at)
|> select([c, _p, m], %{c | last_message: m})

The CTE query it produces:

CTE QUERY: #Ecto.Query<from c0 in 
Postgres.PostgresConversationRepository.Conversation,
order_by: [asc: c0.updated_at],
select: %{c0 | last_message: unknown_binding_2!}>

|> with_cte("conversation_query", as: #Ecto.Query<from c0 in 
Postgres.PostgresConversationRepository.Conversation,
left_join: p1 in Postgres.PostgresConversationRepository.Participation,
on: p1.conversation_id == c0.id and p1.user_id == ^"c23b2640-413a-4823-ab97- 
49eccfe3dcfd",
left_join: m2 in Postgres.PostgresMessageRepository.Message,
on: c0.id == m2.conversation_id, where: not(is_nil(p1.id)),
distinct: [asc: c0.id],
preload: [participants: [:avatar], creator: [:avatar]]>)

I also don’t understand, why order_by and select are above the CTE query.

A join is required to be able to refer to the CTE. For example:

Conversation
|> with_cte("conversation_query", as: ^conversations_query)
|> join(:inner, [c], "conversation_query", on: [id: c.id], as: :msg)
|> order_by([c], c.updated_at)
|> select([c, msg: m], %{c | last_message: m})
# The preload should be outside.
|> preload(creator: :avatar, participants: :avatar)

But this may not be the complete or right solution. The first thing to keep in mind is that distinct will pick an unpredictable row from the duplicates, unless you tell it to order by a particular column. See DISTINCT Clause according to PSQL.

Using the CTE may not even be necessary. A subquery in a join might solve it equally well. For example:

latest_message_query =
  Conversation
  |> join(:inner, [c], Message, on: [conversation_id: c.id], as: :m)
  |> distinct(:id)
  # Indicate that the row with the latest message among the duplicate rows should be picked.
  |> order_by([c, m: m], [c.id, desc: m.inserted_at])
  |> select([m: m], m)

Conversation
|> join(:inner, [c], Participation, on: [conversation_id: c.id, user_id: ^visitor_id])
|> join(:left, [c], subquery(latest_message_query), on: [conversation_id: c.id], as: :m)
|> order_by(:updated_at)
|> select([c, m: m], %{c | last_message: m})
|> preload(creator: :avatar, participants: :avatar)

I made some assumptions and tried to best understand what you’re trying to achieve. I hope this works or at least shows you a path to continue on.

2 Likes

Thank you for your effort to write a detailed answer.
This is close to what I needed. :slight_smile:

1 Like

You’re welcome! Btw, I just noticed that the latest_message_query can be simplified a lot:

latest_message_query =
  Message
  |> distinct(:conversation_id)
  |> order_by([:conversation_id, desc: :inserted_at])
1 Like