Repo.aggregate(queryable, :count, :id) vs select(count: element.id)

Samuel-88 · March 8, 2022, 12:20pm

Hello all,
I hope everyone is doing all right!

A very common task for me is to count the number of entries that match certain criterias.
And I usualy do so by doing the following:

from(o in Organization,
  join: w in Workspace,
  where: o.id == w.org_id)
|> Repo.aggregate(:count)

But recently I took a look at the source code of Ecto to try to understand which option would be better, if it’s to continue to count it that way, or instead:

from(o in Organization,
  join: w in Workspace,
  where: o.id == w.org_id),
  select: count(o.id)

Could someone enlighten me in the differences for the two options and which one should I choose?

Thanks a lot in advance!

03juan · March 8, 2022, 12:42pm

The only difference seems to be that you are able to choose whether to apply it to an existing query, as your 1st example, or build it in to your query, as your 2nd example.

Ecto.Repo.Queryable.aggregate/4 builds a %SelectQuery{} and updates the query map %{query | select: select}. See in line 486

kartheek · March 8, 2022, 12:57pm

Both will generate similar sql:

SELECT count(*) FROM "organization" AS o
  INNER JOIN "workspace" AS w 
  ON TRUE WHERE (o."id" = w."org_id")

With select: count() query, scope for composing it with queries will be limited. You can compose second query from first.

query = from(o in Organization, join: w in Workspace, where: o.id == w.org_id)
Repo.all(query)
Repo.aggregate(query, :count)
# second query can be composed from first without duplicating
count_query = from(query, select: count())
Repo.one(count_query)

When writing queries in Ecto, i will give preference to composability (else there will be duplication of queries).

al2o3cr · March 8, 2022, 3:10pm

They are both ultimately implemented the same way: Repo.aggregate ultimately calls Ecto.Repo.Queryable.query_for_aggregate:

github.com

elixir-ecto/ecto/blob/d792962246abb7876e540052ea9da7d25f105f12/lib/ecto/repo/queryable.ex#L475-L489


      
          defp query_for_aggregate(queryable, aggregate) do
            query =
              case prepare_for_aggregate(queryable) do
                %{distinct: nil, limit: nil, offset: nil, combinations: []} = query ->
                  %{query | order_bys: []}
          
                query ->
                  query
                  |> Query.subquery()
                  |> Queryable.Ecto.SubQuery.to_query()
              end
          
            select = %SelectExpr{expr: {aggregate, [], []}, file: __ENV__.file, line: __ENV__.line}
            %{query | select: select}
          end

This first takes care of a detail: using a LIMIT on a COUNT requires a subquery.

Then it builds an AST equivalent to writing select: count() and replaces any previous select value on the query.

There’s also a head for query_for_aggregate that produces the AST equivalent of select: count(some_field).

Both options exist because they can do different things:

Repo.aggregate knows how to deal with limit etc. Building with select: count requires handling that manually
select: count supports (where available) the filter syntax
select: count can be used as part of a larger structure, for instance:

from(o in Organization,
  left_join: w in Workspace,
  on: w.org_id = o.id
  select: {o.id, count(w.id)},
  group_by: o.id
) |> Repo.all()

which returns a list of {id, count} tuples. Repo.aggregate does not support group_by at all.

Samuel-88 · March 8, 2022, 4:55pm

You are always so helpful! Thanks a lot! I wish you the best