Aggregation in application vs using DB aggregate operations

TL;DR would you prefer to aggregate result_set in DB or application level?

Recently I learned about array aggregate operators such as array_agg in Postgres and json_arrayagg in MySQL.
So in dummy example, suppose we have a schema posts where each post has author_id as a foreign key. I want to get a map with author_ids as keys and the list of related post ids as a value:

from(p in Post, select: {p.author_id, p.id})
|> Repo.all()
|> Enum.reduce(%{}, fn {author_id, post_id}, acc -> 
  Map.update(acc, author_id, [post_id], fn post_ids -> [post_id | post_ids] end) 
end)

or we could

from(p in Post, group_by: p.author_id, select: {p.author_id, fragment("array_agg(?)", p.id)})
|> Repo.all()
|> Enum.into(%{})

In this case? 100% on the DB side. It probably will be much faster that way as you will not need to send repeated author_id over the wire. Also it should utilise indices better (of course if you have one in your DB).

1 Like