How to preload a count of associations but not their contents

AlchemistCamp · January 3, 2020, 10:20am

I have several types of content schemas which follow a similar pattern. They all belong to an Entity schema.

Entity schemas have many Comments (as well as many-to-many relationships with Topics (via an EntityTopic) join table.

I can preload a list of content like so:

def list(schema, order \\ [desc: :inserted_at]) do
  from(s in schema,
    order_by: ^order,
    left_join: entity in assoc(s, :entity),
    left_join: topics in assoc(entity, :topics),
    left_join: comments in assoc(entity, :comments),
    preload: [entity: {entity, topics: topics, comments: comments}]
  )
  |> Repo.all()
end

# Then to use it, Content.list(Article) or Content.list(Podcast), etc...

Similarly, I can preload just comment counts into entities by doing the following:

with_comment_counts =
  from(e in Entity,
    join: c in assoc(e, :comments),
    group_by: e.id,
    select_merge: %{id: e.id, comment_count: count(c.id)}
  )

The question is, how can I change the first query so that it preloads entities and topics with the schema as before, but only loads the comment count and not the comments themselves?

I’ve tried several ways of arranging preload, select and select_merge statements but keep hitting errors revolving around the preload step or formatting the final select.

Ninigi · January 3, 2020, 10:44am

Hm, assoc(s, :entity) returns a queryable, so maybe the easiest would be to reuse your with_comment_counts query?

def with_comment_counts(queryable) do
  from(e in queryable,
    join: c in assoc(e, :comments),
    group_by: e.id,
    select_merge: %{id: e.id, comment_count: count(c.id)}
  )
end

def list(schema, order) do
  from(s in schema,
    order_by: ^order,
    left_join: entity in assoc(s, :entity),
    left_join: topics in assoc(entity, :topics),
    left_join: comments in assoc(entity, :comments),
    preload: [entity: {with_comments_count(entity), topics: topics, comments: comments}]
  )
end

EDIT: Sorry forget to mention, this is untested and might not work at all!

EDIT 2: And it might be ridiculously inefficient… It might even be better to create a view for comment_details and use it as an association in your schemas…

AlchemistCamp · January 3, 2020, 11:02am

This hits that preload error I mentioned:

** (Ecto.Query.CompileError) {with_comment_count(^entity), [topics: topics]} is not a valid preload expression. preload expects an atom, a list of atoms or a keyword list with more preloads as values.

I also tried joining the subquery and using it in the preload:

from(r in schema,
  order_by: ^order,
  left_join: entity in assoc(r, :entity),
  left_join: topics in assoc(entity, :topics),
  left_join: se in subquery(with_comment_counts),
  on: se.id == entity.id,
  preload: [[entity: {se, topics: topics}]]
)
|> Repo.all()

That results in:

** (Ecto.QueryError) can only preload sources with a schema (fragments, binary and subqueries are not supported) in query:

Ninigi · January 3, 2020, 11:04am

Have you tried passing the entity as the queryable?

Ninigi · January 3, 2020, 11:05am

I am really not the best person to take care of this issue… I hope someone will join soon x.x

AlchemistCamp · January 3, 2020, 11:15am

Yes, I can pass the entity as the queriable like this:

preload: [[entity: {entity, topics: topics}]]

That loads what I expect it to, but the whole reason of doing the subquery and trying to use it was to load comment_count into the entity.

My hope was that with the comment counts loaded in se,

preload: [[entity: {se, topics: topics}]]

would have taken the entities (with comment counts already merged in) and then preloaded their topics as before. Subqueries can’t be used this way in preloads though, apparently.

LostKobrakai · January 3, 2020, 11:16am

Can you try using {^query, … instead of calling a function within the query?

AlchemistCamp · January 3, 2020, 11:20am

How do you mean?

amnu3387 · January 3, 2020, 11:32am

You could use something like this, but because the association in Entity is has_many (I guess) comments the final result will be cast into a list and you would need to either map through the results either way or handle the case where it’s a list with a single integer inside, down the line.

def list(schema, order \\ [desc: :inserted_at]) do
  # you need to group by the parent primary_key so the preload works as expected
  comments_query = (
    CommentSchema
    |> Ecto.Query.select([c], count(1))
    |> Ecto.Query.group_by([c], c.entity_id)
  )

  from(s in schema,
    order_by: ^order,
    left_join: entity **in** assoc(s, :entity),
    left_join: topics **in** assoc(entity, :topics),
    preload: [entity: {entity, topics: topics, comments: comments_query}]
  )
  |> Repo.all()
end

Otherwise mapping the joins manually

schema
|> order_by([s], ^order)
|> join(:left, [s], entity in Entity, on: entity.schema_id == s.id)
|> join(:left, [s, entity], topic in Topic, on: topic.schema_id == s.id)
|> join(:left, [s, entity, topic], message in Message, on: message.schema_id == s.id)
|> group_by([s, entity, topic, messages], s.id)
|> select([s, entity, topic, message], %{s | entity: entity, topic: topic, messages: count(message.id)})
|> Db.Repo.all

Should work as well

Ninigi · January 3, 2020, 11:34am

He means bind the query to a variable, and use the pinned variable in your query. However, it sounds like your association might be a little complicated (polymorphic?)

Throwing out the idea of a view again…

AlchemistCamp · January 3, 2020, 11:40am

Yes. It has a few boolean fields and:

    field(:comment_count, :integer, default: 0, virtual: true)
    has_many(:comments, Comment, foreign_key: :parent_id)
    has_one(:article, Article)
    has_one(:episode, Episode)
    has_one(:resource, Resource)
    has_one(:podcast, Podcast)
    many_to_many(:topics, Topic, join_through: "entity_topics")
    many_to_many(:users, User, join_through: "entity_likes")

Article, Episode, Resource and Podcast each have an entity_id and use the query I’m trying to preload comments into.

hauleth · January 3, 2020, 11:41am

You can use:

with_comment_counts =
  from(e in Entity,
    join: c in subquery(from c in Comments,
      group_by: c.entity_id,
      select: %{entity_id: c.entity_id, count: count()}
    ),
    on: c.entity_id == e.id,
    select_merge: %{id: e.id, comment_count: c.count}
  )

And then it will naturally fit into second one:

def list(schema, order \\ [desc: :inserted_at]) do
  from(s in schema,
    order_by: ^order,
    left_join: entity in assoc(s, :entity),
    left_join: topics in assoc(entity, :topics),
    left_join: comments in subquery(from c in Comments,
      group_by: c.entity_id,
      select: %{entity_id: c.entity_id, count: count()}
    ),
    on: comments.entity_id == entity.id,
    preload: [entity: {entity, topics: topics}],
    select_merge: %{comments_count: comments.count}
  )
  |> Repo.all()
end

Alternatively create (materialised?) view in DB with such query or create counter cache that will store that information without doing join each time you need that information.

AlchemistCamp · January 3, 2020, 11:42am

I replaced CommentSchema with Comment (since that’s what mine’s called) and changed comments query to comments: ^comments_query in the preload and got a Postgrex error:

ERROR 42803 (grouping_error) column “c0.parent_id” must appear in the GROUP BY clause or be used in an aggregate function

Ninigi · January 3, 2020, 11:49am

Personally I would probably reach for a view as @hauleth also suggested

This tutorial holds up in most parts, the gist of it is: you create a query that returns the fields you need and pretend it is a regular old database table. You can set your view as an association in the schemas (with a where clause maybe) and your ecto queries will be a lot easier to handle (especially if you need that count a lot.)

AlchemistCamp · January 3, 2020, 11:54am

I think this one is really close. The whole problem has been preloads breaking anytime I have a subquery or prior merge involved, so doing the select_merge after the preload seems like the way.

This hits an error since the schema s doesn’t have a comment_count field on it or associated comments. It’s Entity that has comments.

AlchemistCamp · January 3, 2020, 11:58am

Thanks! I haven’t actually worked with views before, but it sounds like this is a good time to learn.

The counts are basically just so I can show a count of comments in the index page, without loading every single comment for every single article (which would be a bit excessive).

hauleth · January 3, 2020, 12:04pm

Only one comment about that tutorial - if you are using PostgreSQL earlier than 12 then try to avoid CTEs in favour of nested queries. It will impact the readability but CTEs have hidden performance penalty as these queries are always materialised and do not support predicate pushdown, aka:

WITH tab AS (
  SELECT * FROM foos
)
SELECT id FROM tab LIMIT 1

Will still fetch all entries from foos table.

amnu3387 · January 3, 2020, 12:04pm

Views do work fine for this as others pointed out, though that first query should be ok, with the caveat that the result is still a list, as long as you group the “subquery”. I tried it in a schema with a has_many and it results in this:

%Chat{
   __meta__: #Ecto.Schema.Metadata<:loaded, "chats">,
   inserted_at: ~U[2020-01-01 09:15:25Z],
   messages: [4],
   #....
}

Still, I think either a view or a hand writing the joins is the way to go, since preloads basically are intended to preload the actual records and not aggregates.

AlchemistCamp · January 3, 2020, 12:22pm

preload: [entity: {entity, topics: topics}],
select_merge: %{comments_count: comments.count}

Is there a way to use select_merge to put the comment counts in the entities instead of at the base level? If so, this is the whole solution!

AlchemistCamp · January 3, 2020, 2:30pm

A working but kludgey fix would be to just query twice and manually merge the results like this:

def list(schema, order \\ [desc: :inserted_at]) do
  eid_comments =
    from(s in schema,
      left_join: e in assoc(s, :entity),
      join: c in assoc(e, :comments),
      group_by: e.id,
      select: {e.id, count(c.id)}
    )
    |> Repo.all() # returns tuple pairs of ids and counts
    |> Enum.into(%{}) # transforms it into a map of entity IDs to counts

  from(s in schema,
    order_by: ^order,
    left_join: entity in assoc(s, :entity),
    left_join: topics in assoc(entity, :topics),
    left_join: comments in assoc(entity, :comments),
    preload: [entity: {entity, topics: topics, comments: comments}]
  )
  |> Repo.all()
  |> Enum.map(fn item -> # Merge by looking up comment count on each entity and then updating
    %{item |
      entity: %{item.entity | 
        comment_count: eid_comments[item.entity.id] || 0
    }}
  end)
end

On the plus side, it only gets comment counts, it only even does that on the appropriate schema type and it includes everything the controllers need to work with normally.

On the minus side, it’s bailing out of Ecto and doing things manually and it’s a bit messy.

I’m going to refactor this into a separate function (maybe not load comments at all in list() and then make a separate list_with_comment_counts that uses the main list function, gets comment counts and does the merging.

Definitely going to look into views as @Ninigi and @hauleth recommended and keep my eyes open for a better way to structure this.