Repo.aggregate(queryable, :count, :id) vs select(count: element.id)

Hello all,
I hope everyone is doing all right!

A very common task for me is to count the number of entries that match certain criterias.
And I usualy do so by doing the following:

from(o in Organization,
  join: w in Workspace,
  where: o.id == w.org_id)
|> Repo.aggregate(:count)

But recently I took a look at the source code of Ecto to try to understand which option would be better, if it’s to continue to count it that way, or instead:

from(o in Organization,
  join: w in Workspace,
  where: o.id == w.org_id),
  select: count(o.id)

Could someone enlighten me in the differences for the two options and which one should I choose?

Thanks a lot in advance! :smiley:

The only difference seems to be that you are able to choose whether to apply it to an existing query, as your 1st example, or build it in to your query, as your 2nd example.

Ecto.Repo.Queryable.aggregate/4 builds a %SelectQuery{} and updates the query map %{query | select: select}. See in line 486

1 Like

Both will generate similar sql:

SELECT count(*) FROM "organization" AS o
  INNER JOIN "workspace" AS w 
  ON TRUE WHERE (o."id" = w."org_id") 

With select: count() query, scope for composing it with queries will be limited. You can compose second query from first.

query = from(o in Organization, join: w in Workspace, where: o.id == w.org_id)
Repo.all(query)
Repo.aggregate(query, :count)
# second query can be composed from first without duplicating
count_query = from(query, select: count())
Repo.one(count_query)

When writing queries in Ecto, i will give preference to composability (else there will be duplication of queries).

2 Likes

They are both ultimately implemented the same way: Repo.aggregate ultimately calls Ecto.Repo.Queryable.query_for_aggregate:

This first takes care of a detail: using a LIMIT on a COUNT requires a subquery.

Then it builds an AST equivalent to writing select: count() and replaces any previous select value on the query.

There’s also a head for query_for_aggregate that produces the AST equivalent of select: count(some_field).

Both options exist because they can do different things:

  • Repo.aggregate knows how to deal with limit etc. Building with select: count requires handling that manually
  • select: count supports (where available) the filter syntax
  • select: count can be used as part of a larger structure, for instance:
from(o in Organization,
  left_join: w in Workspace,
  on: w.org_id = o.id
  select: {o.id, count(w.id)},
  group_by: o.id
) |> Repo.all()

which returns a list of {id, count} tuples. Repo.aggregate does not support group_by at all.

3 Likes

You are always so helpful! Thanks a lot! I wish you the best