Loading associated schema together with the count of items (in many_to_many relations)?

dalerka · August 2, 2020, 4:14pm

I’m trying to preload an associated schema and count of records (likes) in a many_to_many relation.

These are 3 relevant schemas:

schema "users" do
    has_many :created_posts, Post, foreign_key: :poster_id
    many_to_many(:liked_posts, Post, join_through: LikePost, on_replace: :delete)
...

schema "posts" do
    field :likes_count, :integer, default: 0, virtual: true
    belongs_to(:poster, User)
    many_to_many(:liked_by, User, join_through: LikePost, on_replace: :delete)
...

schema "like_posts" do
    belongs_to :user, User
    belongs_to :post, Post
...

When I load the User, I’d like to preload both created_posts and the count of likes for each post (and it works):

def get_user!(id) do
  Repo.get!(User, id)
  |> Repo.preload(
    created_posts: {posts_with_counts(), tags: []},   # this works!
    liked_posts: {posts_with_counts(), tags: []}      # this results in grouping error
  )
end

def posts_with_counts do
  from(p in post,
    left_join: u in assoc(p, :liked_by),
    group_by: p.id,
    select_merge: %{likes_count: count(u.id)}
  )
end

I also want to preload the liked_posts and the count of likes (by all users) for each post.

However the same query posts_with_counts for the liked_posts throws error:

(Postgrex.Error) ERROR 42803 (grouping_error) column “u2.id” must appear in the GROUP BY clause or be used in an aggregate function

So, there’s some subtle difference between querying associations in has_many and many_to_many relations which I struggle to understand.

What am I missing? Is there a better way to implement this?
Appreciate your advice!

LostKobrakai · August 3, 2020, 9:22am

Did you let ecto log the queries so you can compare them?

dalerka · August 3, 2020, 9:36am

How should I do that?
I only have the standard Phoenix logs, and I’m not very fluent in SQL.

Can you point anything wrong with the code above?

LostKobrakai · August 3, 2020, 9:50am

Set the log level for your ecto repo to one, where queries are logged. Then you can see how those queries differ.

You can post them here as well if you want, but I’d highly encourage anyone to invest in getting to know sql better. Ecto is not like ORMs, where you can basically ignore that there’s sql in the back.

dalerka · August 3, 2020, 9:58am

Can you point to docs/article about this?
Ecto docs mentions:

:log - the log level used when logging the query with Elixir’s Logger. If false, disables logging for that repository. Defaults to :debug

If this is what you mean, then by default it should be :debug.

Anyway, is there a better way to reach the goal of loading the records and the count of related records?

dalerka · August 3, 2020, 10:22am

Here’s the full error log:

[error] Task #PID<0.820.0> started from #PID<0.809.0> terminating
** (Postgrex.Error) ERROR 42803 (grouping_error) column "u2.id" must appear in the GROUP BY clause or be used in an aggregate function

    query: SELECT p0."id", p0."title", p0."views_count", p0."poster_id", 
           p0."inserted_at", p0."updated_at", count(u1."id"), u2."id" 
           FROM "posts" AS p0 
           LEFT OUTER JOIN "like_posts" AS f4 ON f4."post_id" = p0."id" 
           LEFT OUTER JOIN "users" AS u1 ON f4."user_id" = u1."id" 
           INNER JOIN "users" AS u2 ON u2."id" = ANY($1) 
           INNER JOIN "like_posts" AS f3 ON f3."user_id" = u2."id" 
           WHERE (f3."post_id" = p0."id") 
           GROUP BY p0."id" ORDER BY u2."id", p0."inserted_at" DESC
    
    (ecto_sql 3.4.5) lib/ecto/adapters/sql.ex:593: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql 3.4.5) lib/ecto/adapters/sql.ex:526: Ecto.Adapters.SQL.execute/5
    (ecto 3.4.5) lib/ecto/repo/queryable.ex:192: Ecto.Repo.Queryable.execute/4
    (ecto 3.4.5) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
    (ecto 3.4.5) lib/ecto/repo/preloader.ex:226: Ecto.Repo.Preloader.fetch_query/8
    (elixir 1.10.4) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
    (elixir 1.10.4) lib/task/supervised.ex:35: Task.Supervised.reply/5
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3

Function: &:erlang.apply/2
    Args: [#Function<8.45456768/1 in Ecto.Repo.Preloader.maybe_pmap/3>, [#Function<18.45456768/1 in Ecto.Repo.Preloader.prepare_queries/6>]]
[error] Ranch protocol #PID<0.809.0> of listener AppWeb.Endpoint.HTTP (connection #PID<0.769.0>, stream id 5) terminated

** (exit) %Postgrex.Error{connection_id: 17933, message: nil, postgres: 
    %{code: :grouping_error, file: "parse_agg.c", line: "1369", 
    message: "column \"u2.id\" must appear in the GROUP BY clause 
    or be used in an aggregate function",  
    pg_code: "42803", position: "303", routine: "check_ungrouped_columns_walker",   
    severity: "ERROR", unknown: "ERROR"}, 
    query: "SELECT p0.\"id\", p0.\"title\", p0.\"views_count\", p0.\"poster_id\", 
    p0.\"inserted_at\", p0.\"updated_at\", count(u1.\"id\"), u2.\"id\" 
    FROM \"posts\" AS p0   
    LEFT OUTER JOIN \"like_posts\" AS f4 ON f4.\"post_id\" = p0.\"id\"   
    LEFT OUTER JOIN \"users\" AS u1 ON f4.\"user_id\" = u1.\"id\" 
    INNER JOIN \"users\" AS u2 ON u2.\"id\" = ANY($1) 
    INNER JOIN \"like_posts\" AS f3 ON f3.\"user_id\" = u2.\"id\" 
    WHERE (f3.\"post_id\" = p0.\"id\") GROUP BY p0.\"id\" 
    ORDER BY u2.\"id\", p0.\"inserted_at\" DESC"}
    (ecto_sql 3.4.5) lib/ecto/adapters/sql.ex:593: Ecto.Adapters.SQL.raise_sql_call_error/1
    (ecto_sql 3.4.5) lib/ecto/adapters/sql.ex:526: Ecto.Adapters.SQL.execute/5
    (ecto 3.4.5) lib/ecto/repo/queryable.ex:192: Ecto.Repo.Queryable.execute/4
    (ecto 3.4.5) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3
    (ecto 3.4.5) lib/ecto/repo/preloader.ex:226: Ecto.Repo.Preloader.fetch_query/8
    (elixir 1.10.4) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
    (elixir 1.10.4) lib/task/supervised.ex:35: Task.Supervised.reply/5
    (stdlib 3.13) proc_lib.erl:226: :proc_lib.init_p_do_apply/3

LostKobrakai · August 3, 2020, 10:34am

The issue is that you can’t use GROUP BY and at the same time select columns not listed in the group by. I’d be interested in the working query though, because I’d expect the other query to not work as well.

You can work around this by calculating the likes in a subquery you join to your preload query.

dalerka · August 3, 2020, 11:26am

Okay, here’s the working query (from the first post)

def get_user!(id) do
  Repo.get!(User, id)
  |> Repo.preload(
    created_posts: {posts_with_counts(), tags: []},   # this works!
  )
end

def posts_with_counts do
  from(p in post,
    left_join: u in assoc(p, :liked_by),
    group_by: p.id,
    select_merge: %{likes_count: count(u.id)}
  )
end

and its log:

    [debug] QUERY OK source="posts" db=1.1ms idle=137.6ms
    SELECT p0."id", p0."title", p0."views_count", p0."poster_id", 
    p0."inserted_at", p0."updated_at", count(u1."id"), p0."poster_id" 
    FROM "posts" AS p0 
    LEFT OUTER JOIN "like_posts" AS f2 ON f2."post_id" = p0."id" 
    LEFT OUTER JOIN "users" AS u1 ON f2."user_id" = u1."id" 
    WHERE (p0."poster_id" = $1) 
    GROUP BY p0."id" ORDER BY p0."poster_id", p0."inserted_at" 
    DESC [<<187, 12, 179, 79, 103, 104, 72, 103, 182, 119, 137, 100, 144, 214, 126, 221>>]

So I see the difference, but don’t understand why it occurs:

# problem query error log (from above): 
       p0."inserted_at", p0."updated_at", count(u1."id"), u2."id"     # why is it "u2"? 
       FROM "posts" AS p0 
       LEFT OUTER JOIN "like_posts" AS f4 ON f4."post_id" = p0."id" 
       LEFT OUTER JOIN "users" AS u1 ON f4."user_id" = u1."id" 
       INNER JOIN "users" AS u2 ON u2."id" = ANY($1) 
       INNER JOIN "like_posts" AS f3 ON f3."user_id" = u2."id" 
       WHERE (f3."post_id" = p0."id")

dalerka · August 3, 2020, 11:55am

Couldn’t figure out how to form the query, from the docs:

query = from c in Comment, order_by: c.published_at
posts = Repo.preload posts, [comments: {query, [:replies, :likes]}]

If you mean something different, would you be so kind to show with code?

LostKobrakai · August 3, 2020, 12:27pm

The preload docs are not the ones you want to look at. They just tell you how to attach a query. They won’t tell you how do build a correct query – they maybe should tell you how the query is further processed for preloading various assocs though (adding the “SELECT … u.id” part) though.

You’re adding a GROUP BY to your preload query and this means additional selects can’t be arbitrarily added without them either being aggregations or them also being grouped by. This is how SQL works.

SELECT p0."id", p0."title", p0."views_count", p0."poster_id", 
    p0."inserted_at", p0."updated_at", count(u1."id"), p0."poster_id" 
    GROUP BY p0."id"

This only works because if you group by the primary key you can select all columns of that table. Otherwise the previous statements apply.

Ecto adding the additional SELECT u.id basically makes you query invalid.

One way to work around that is not doing a GROUP BY on your preload query, but rather joining a subquery, which does the aggregation of likes in a separate query, which makes the SELECT u.id of ecto not break anything, as the root query doesn’t do aggregation. For docs on how to add a subquery you can look here: Ecto.Query — Ecto v3.4.5

Most of the above is about SQL much more than about ecto, which hopefully shows what I initially meant by:

dalerka · August 4, 2020, 9:14am

Thanks, I’m still trying to wrap my head around using subqueries and maybe this guide will help others:
https://hexdocs.pm/ecto/aggregates-and-subqueries.html#content

Yet, if anyone wants to share an example of how to do this in a scaleable way, I think many Ecto noobs will highly appreciate.

Of course, there are different approaches. I’d prefer to find an idiomatic Ecto way, which should be easy to understand (without messing with SQL).

But maybe it’s more efficient to do separate queries and somehow match and merge the results in Elixir?

dalerka · August 4, 2020, 10:04am

Ok, not sure if this is the right/good/best way to implement, but seems to work as expected (for anyone interested):

def liked_posts_with_counts do
  likes_count =
    from lp in LikePost,
      group_by: lp.post_id,
      select: %{post_id: lp.post_id, count: count(lp.user_id)}

  from(j in Post,
    preload: [:tags],
    join: lp in subquery(likes_count),
    on: lp.post_id == p.id,
    select_merge: %{likes_count: lp.count}
  )
end

# Update: refactored this also:
def get_user!(id) do
  user = from u in User, preload: [
    created_posts: posts_with_counts(),  
    liked_posts: liked_posts_with_counts()
  ]
  Repo.get!(user, id)
end

Any improvement suggestions are welcome!