How to correctly using Distinct function in ecto?

This is simplified data structures

schema "campaigns" do
  belongs_to :created_by, User, type: :id
  has_one :analytics
end

schema "analytics" do
  belongs_to :campaign, Campaign
end

schema "message_records" do
  belongs_to :campaign, Campaign
end

Campaign has_one analytics. But more than one message_records are belongs_to one campaign.
Because campaign will have many(like 10-1000) meesages which has one message_records each.

What I want is that return this value grouped_by campaign.created_by_id

user_id: campaign.created_by_id
total_campaigns_count: count(campaign.id),
total_messages_count: sum(a.sent_messages),
total_credit_used: sum(mdr.segments),

I tried

 campaign_with_mdr =
      base_query()
      |> where([c], ^filters_with(filters))
      |> join(:inner, [c], mdr in MessageDetailsRecord, on: c.id == mdr.campaign_id)
      |> distinct(:id)

query = 
      from cm in subquery(campaign_with_mdr),
        left_join: a in assoc(cm, :analytics),
        group_by: [c.created_by_id],
        select: %{
          user_id: cm.created_by_id,
          total_campaigns_count: count(c.id),
          total_messages_count: sum(a.sent_messages),
          total_credit_used: sum(mdr.segments)
        }
Repo.all(query)

This query won’t return duplicate campaigns but can’t get a total_credit_used: sum(mdr.segments)

Then I tried other method

 query =
      from c in Campaign,
        where: ^filters_with(filters),
        left_join: a in assoc(c, :analytics),
        join: mdr in MessageDetailsRecord,
        on: c.id == mdr.campaign_id,
        distinct: c.id,
        group_by: [c.created_by_id, c.id],
        select: %{
          user_id: c.created_by_id,
          total_campaigns_count: count(c.id),
          total_messages_count: sum(a.sent_messages),
          total_credit_used: sum(mdr.segments),

        }

    Repo.all(query)

This method’s problem is that I need to include c.id into group_by clause or not get an error says
ERROR 42803 (grouping_error) column "c0.id" must appear in the GROUP BY clause or be used in an aggregate function

Can you help?

Cant you just split it up in two or three queries?

My interpretation of what you’re selecting suggests that there are two different kinds of aggregate calculations:

  • per campaign, to add up all the mdr.segments
  • per user, to count campaigns and add up all the sent_messages

The first part is a small query:

ids_with_mdr_counts_query =
  from(c in Campaign,
    where: ^filters_with(filters),
    join: mdr in MessageDetailsRecord,
    on: c.id == mdr.campaign_id,
    group_by: c.id,
    select: %{
      campaign_id: c.id,
      user_id: c.created_by_id,
      total_credit_used: sum(mdr.segments)
    }
  )

Then the rest:

from(data in subquery(ids_with_mdr_counts_query),
  left_join: a in Analytics,
  on: a.campaign_id == data.campaign_id,
  group_by: data.user_id,
  select: %{
    user_id: data.user_id,
    total_campaigns_count: count(data.campaign_id),
    total_messages_count: sum(a.sent_messages),
    total_credit_used: sum(data.total_credit_used)
  }
)

This takes the now-aggregated {campaign_id, user_id, total_credit_used} tuples for each campaign matching filters and aggregates them further by user_id.

(apologies if there are errors in the above, I write most Ecto queries through incremental refinement in iex so this is trickier)

Thanks for the detailed answer. But I still don’t understand. which point does it remove duplicate message detail record?

 campaign_with_mdr =
        base_query()
        |> where([c], ^filters_with(filters))
        |> join(:inner, [c], mdr in MessageDetailsRecord, as: :records, 
            on: c.id == mdr.campaign_id)
        |> distinct(true)
  
      final_query =
        from [cm, records: mdr] in campaign_with_mdr,
          left_join: a in assoc(cm, :analytics),
          group_by: [cm.created_by_id],
          select: %{
            user_id: cm.created_by_id,
            total_campaigns_count: count(cm.id),
            total_messages_count: sum(a.sent_messages),
            total_credit_used: sum(mdr.segments),

          }

This is what I tried and what is difference between your code and mine?

The problem is there’s no source in the query called mdr. You have cm and a. It will probably work if you put that in. But not sure what you want it to be. There should be an issue with c as well but maybe it’s masked by the mdr error.

edit: Now that I read it more carefully, you actually probably want to reference them both with cm because they are coming from the subquery.

The results of your campaign_with_mdr query aren’t changed by the distinct(true) because there aren’t any duplicates - each row has a distinct pair of (campaigns.id, message_details_records.id) values. More concretely, imagine we have this data:


campaigns:

id created_by_id
1 4
2 4
3 5

message detail records:

id campaign_id segments
6 1 1
7 1 2
8 1 7
9 2 1
10 2 1
11 3 1

analytics:

id campaign_id sent_messages
12 1 17
13 2 42
14 3 100

Then the rows from your campaign_with_mdr subquery look like:

campaigns.id campaigns.created_by_id mdrs.id mdrs.segments
1 4 6 1
1 4 7 2
1 4 8 7
2 4 9 1
2 4 10 1
3 5 11 1

Joining the analytics table adds those columns to each candidate row:

campaigns.id campaigns.created_by_id mdrs.id mdrs.segments a.id a.sent_messages
1 4 6 1 12 17
1 4 7 2 12 17
1 4 8 7 12 17
2 4 9 1 13 42
2 4 10 1 13 42
3 5 11 1 14 100

I’m assuming the “duplicates” you refer to are the multiple rows with the same campaigns.id value here, which will cause things like total_campaigns_count to be incorrect.

The version in my previous post avoids that by adding up mdr.segments first and then joining and counting. The rows from the ids_with_mdr_counts_query look like:

campaign_id user_id total_credit_used
1 4 10
2 4 2
3 5 1

This is where the “duplicates” went: they’ve already been summed, producing exactly one row per campaigns.id.

Joining analytics works just like before:

campaign_id user_id total_credit_used a.id a.sent_messages
1 4 10 12 17
2 4 2 13 42
3 5 1 14 100

Now the sums and counts are correct - user 4 has 2 total campaigns, 12 total credits used, and 59 messages sent.

1 Like

Thanks @al2o3cr !!! I really appreciated!