Filter by an ad-hoc aggregated field

Hi, is there a way to filter by an ad-hoc aggregated field? The following, where Session has many Summaries doesn’t filter:

    Session
    |> Ash.Query.select([:id])
    |> Ash.Query.aggregate(:count_of_summaries, :count, :summary)
    |> Ash.Query.filter(:count_of_summaries > 1)
    |> Ash.read!(authorize?: false)

it’s just produces an array of Sessions with aggregates set like this aggregates: %{count_of_summaries: 1}.

Essentially what I want is a HAVING query:

select se.id, count(su.id) from sessions se 
left join summaries su on se.id == su.session_id
group by se.id
having count(su.id) > 1

I can of course do one of the following:

  • Drop to Ecto
  • Create an aggregate field, and then a read action filter by this field

But I’m curious if I can do it in REPL without touching the source code and without writing an explicit SQL, for ad-hoc data exploration?

I suspect join_filters might be the answer, but I couldn’t make it work.

Thanks!

Ok (thanks to ChatGPT o3 w/ search for the idea), is seems that

Session
|> Ash.Query.filter(count(summary) > 1)
|> Ash.read!()

should work as expected! However, I’m still a bit puzzled why the above (admittedly, naïve) solution doesn’t work, as e.g. the following sorting by an aggregate field is explicitly mentioned in the docs for Ash.Query:

MyApp.Author
|> Ash.Query.aggregate(:published_post_count, :posts, query: [filter: [published: true]])
|> Ash.Query.sort(published_post_count: :desc)
|> Ash.Query.limit(10)
|> Ash.read!()

and one would expect that if a sort by an ad-hoc aggregate field works, a filter should work as well.

There is also a parameter :filterable in Ash.Query.Aggregate.new, which is defined like this:

  • :filterable? - Whether or not this aggregate may be used in filters. The default value is true .

and it says that Aggregate.new is used with Query.aggregate or Ash.aggregate.

That is actually an out of date example, sorry. You would pass the result of Ash.Query.Aggregate.new into the sort, i.e Ash.Query.sort({aggregate, :desc}). However, your filter example should work AFAIK. In what way is it not working?

My first example (I assume you are talking about the very first post in the thread) simply doesn’t filter by count_of_summaries - it returns all entries, even those where count of summaries is 0 or 1. (There are no errors or warnings, though). Is it considered a bug? If yes, I can file a ticket in ash_postgres.

Sorry for being the source of all these edge cases :).

The following does work:

Session
|> Ash.Query.filter(count(summary) > 1)
|> Ash.read!()

Here is an example trying to aggregate on a different field. This query returns only 9 records with more than 4 interactions, which agrees with the sql:

iex(50)> FiveWhys.Understand.Session |> Ash.Query.filter(count(interactions) > 4) |>  Ash.read!(authorize?: false) |> length
[debug] QUERY OK source="sessions" db=37.6ms idle=511.7ms
SELECT s0."id", s0."title", s0."shared", s0."updated_at", s0."inserted_at", s0."user_id", s0."problem_statement", s0."prior_session_id" FROM "sessions" AS s0 LEFT OUTER JOIN LATERAL (SELECT si0."session_id" AS "session_id", coalesce(count(*), $1::bigint)::bigint AS "aggregate_0" FROM "public"."interactions" AS si0 WHERE (s0."id" = si0."session_id") GROUP BY si0."session_id") AS s1 ON TRUE LEFT OUTER JOIN "public"."summaries" AS s2 ON s0."id" = s2."session_id" LEFT OUTER JOIN LATERAL (SELECT si0."session_id" AS "session_id", any_value(si0."inserted_at"::timestamp ORDER BY  si0."inserted_at" DESC ) FILTER (WHERE si0."inserted_at"::timestamp IS NOT NULL)::timestamp AS "latest_interaction_date" FROM "public"."interactions" AS si0 WHERE (s0."id" = si0."session_id") GROUP BY si0."session_id") AS s3 ON TRUE WHERE (coalesce(s1."aggregate_0", $2::bigint)::bigint::bigint > $3::bigint::bigint) ORDER BY (CASE WHEN NOT (s2."id"::uuid IS NULL) AND (s2."inserted_at"::timestamp::timestamp > s3."latest_interaction_date"::timestamp::timestamp) THEN s2."inserted_at"::timestamp ELSE s3."latest_interaction_date"::timestamp END)::timestamp DESC [0, 0, 4]
↳ anonymous fn/3 in AshPostgres.DataLayer.run_query/2, at: lib/data_layer.ex:785
9
select count(*) from (select s.id, count(*) from sessions s inner join interactions i on s.id = i.session_id group by s.id having count(*) > 4
);
 count
-------
     9
(1 row)

This query returns 15 (all of the sessions):

iex(51)> FiveWhys.Understand.Session |> Ash.Query.aggregate(:count_of_interactions, :count, :interactions) |> Ash.Query.filter(:count_of_interactions > 4) |>  Ash.read!(authorize?: false) |> length
[debug] QUERY OK source="sessions" db=19.7ms queue=1.0ms idle=868.8ms
SELECT s0."id", s0."title", s0."shared", s0."updated_at", s0."inserted_at", s0."user_id", s0."problem_statement", s0."prior_session_id", coalesce(s1."count_of_interactions"::bigint, $1::bigint)::bigint FROM "sessions" AS s0 LEFT OUTER JOIN LATERAL (SELECT si0."session_id" AS "session_id", coalesce(count(*), $2::bigint)::bigint AS "count_of_interactions" FROM "public"."interactions" AS si0 WHERE (s0."id" = si0."session_id") GROUP BY si0."session_id") AS s1 ON TRUE LEFT OUTER JOIN "public"."summaries" AS s2 ON s0."id" = s2."session_id" LEFT OUTER JOIN LATERAL (SELECT si0."session_id" AS "session_id", any_value(si0."inserted_at"::timestamp ORDER BY  si0."inserted_at" DESC ) FILTER (WHERE si0."inserted_at"::timestamp IS NOT NULL)::timestamp AS "latest_interaction_date" FROM "public"."interactions" AS si0 WHERE (s0."id" = si0."session_id") GROUP BY si0."session_id") AS s3 ON TRUE ORDER BY (CASE WHEN NOT (s2."id"::uuid IS NULL) AND (s2."inserted_at"::timestamp::timestamp > s3."latest_interaction_date"::timestamp::timestamp) THEN s2."inserted_at"::timestamp ELSE s3."latest_interaction_date"::timestamp END)::timestamp DESC [0, 0]
↳ anonymous fn/3 in AshPostgres.DataLayer.run_query/2, at: lib/data_layer.ex:785
15
select count(*) from sessions;
 count
-------
    15
(1 row)

iex(51)> FiveWhys.Understand.Session |> Ash.Query.aggregate(:count_of_interactions, :count, :interactions) |> Ash.Query.filter(:count_of_interactions > 4) |> Ash.read!(authorize?: false) |> length

:count_of_interactions > 4 is comparing an atom to an integer, and the literal comparison is actually expanded before querying to just filter: true.

You can’t reference those dynamic aggregates by name (back in the day you could, but no longer, it had ambiguity/compositional errors).

1 Like