How to preload in a subquery?

I have the following schemata:

schema "posts" do
  field :body, :string
  has_many :comments, Comment
end

schema "comments" do
  field :message, :string
  belongs_to :post, Post
  has_many :tag, Tag
end

schema "tags" do
  field :tag, :string
  belongs_to :comment, Comment
end

I want to select all posts, an arbitrary comment for each post, and the tags for each comment. I initially tried using preload in a subquery.

Repo.all(
  from post in Posts,
  as: :post,
  left_lateral_join: arbitrary_comment in subquery(
    from comment in Comment,
    where: comment.post_id == parent_as(:post).id,
    left_join: tag in assoc(comment, :tags),
    preload: [tags: tag],
    limit: 1),
  on: true,
  select: %{post: post, arbitrary_comment: arbitrary_comment}
)

but this returned an error since preloading isn’t allowed in subqueries. I also tried

Repo.all(
  from post in Posts,
  as: :post,
  left_lateral_join: arbitrary_comment in subquery(
    from comment in Comment,
    where: comment.post_id == parent_as(:post).id,
    left_join: tag in assoc(comment, :tags),
    select: %Comment{comment | tags: tag}),
    limit: 1),
  on: true,
  select: %{post: post, arbitrary_comment: arbitrary_comment}
)

but this only returned one tag for each comment.

I don’t want to use the hack of putting arbitrary_comment in the comments association, because what if e.g. I want to return two comments, like the most recent comment and the most popular comment.

Might be because of a typo?
has_many :tag, Tag => has_many :tags, Tag

3 Likes

Thanks for spotting the typo; with the typo the code errors out, after fixing it it behaves incorrectly, yielding a comment struct for each tag.

My terribly hacky solution is to write a query using jsonb_agg that returns a list of json objects, and then to cast those objects to structs.

I have a feeling there’s space to improve the Ecto query preload api. Allowing preloads in subqueries would lead to ambiguous semantics (what if the same association is preloaded from two (sub)queries?) but maybe there’s another way for Ecto to be able to return nested structs.

Have you tried using Repo.preload? Something like:

posts = Repo.all(Post)
posts = Repo.preload(posts, [comments: {from(c in Comment, limit: 1), [:tags]}])

That’s the hack of using the ‘comments’ association I mentioned. It works, but only once — what if you want to return a ‘latest_comment’ and a ‘popular_comment’? There’s only one ‘comments’ assocation.

Just spitballing here, but perhaps you could have 2 has_one associations using the :where option. This might allow you to create a latest_comment and a popular_comment association, while keeping your main has_many :comments assocation.

Not sure if this helps your problem, but just putting it out there.

2 Likes

Adding another association to the schema seems to be the easiest way. You could even use a virtual field and a custom preload query for full flexibility.

It’s a bit sad to have to pollute the main schema with artifacts of context logic — ideally I guess there would be a way to define schema fields just in the scope of a query.

What stops you from using select_merge and just, ahem, merge these latest_* fields when calling just one particular context function?

I assumed you could only preload schema associations, but I’ll try that, thanks.

It’s pretty easy to use and people often use it for the so-called “calculated” or “virtual” fields as well: Ecto.Query — Ecto v3.12.5

1 Like

Is it possible to somehow merge a schema with a key that does not exist on it? Otherwise, the schema will still need to have the key, even when is it only used in one place. I have been dealing with this recently and have not been able to find a satisfying solution.

1 Like

I tried preloading a field that wasn’t in the schema and it failed with field `latest_comment` in preload is not an association in query, so it does seem like you can only preload associations defined in the schema.

Re-reading your comments though it seems like you weren’t suggesting anything about preload, but rather to include the left-joined columns in a select_merge? That has the same issue as the second example in the original post where it doesn’t preserve the nesting structure of the join tree.

Preloading non-schema fields would be kinda cool, like a limited version of preload that only handles assembling nested structs from the join tree, without creating additional queries. Probably a bunch of holes in the semantics but a tree_agg expression like

Repo.all(
  from post in Posts,
  as: :post,
  left_lateral_join: arbitrary_comment in subquery(
    from comment in Comment,
    where: comment.post_id == parent_as(:post).id,
    limit: 1),
  on: true,
  left_join: tag in assoc(arbitrary_comment, :tags),
  on: tag.comment_id == arbitrary_comment.id,
  tree_agg: [arbitrary_comment: {arbitrary_comment, tags: tag}]
)

would be very cool.

I have been experimenting with this a bit more and I think I have a solution that’s pretty good.

Let’s say we simply want to get all posts and for each of them, we also want the number of comments. It should be possible to generalize this for more complex situations.

from post in Post,
  left_join: comment in assoc(post, :comments),
  group_by: post.id,
  select_merge: map(post, ^Post.__schema__(:fields)),
  select_merge: %{comments_count: count(comment)}

We can make a helper macro for selecting all fields of the schema.

defmacro as_map(binding, schema) do
  quote do
    map(unquote(binding), ^unquote(schema).__schema__(:fields))
  end
end

It would be even better if the schema could be inferred from the binding, but I’m not sure how to do that.

Why not select(query, %{post: post, comments_count: count(comment)}. In my experience that stuff goes to some view layer where it’s then transformed again to html/json/whatever data format presented to the outside anyways.

1 Like

That does seem like the obvious solution, but I don’t like that it makes the rest of the code aware of the persistence implementation. Why is this value separate when everything else is in the post struct? Of course, because that’s how the data is laid out in the database. But why should the view layer know that?

It also creates a naming issue. What should this combined value be called when it’s used as a variable? It doesn’t really represent anything new.

I know these are fairly minor complaints. I guess my thinking here goes back to a previous thread, where I talked about wishing for more powerful mapping features in Ecto.

You can turn that argument around just as easily. Why is metadata about a post part of the post itself? You cannot write to comments_count. That discussion becomes clearer once you start using dedicated read and write “models”, where the comments_count could then become a field on the read side schema.

1 Like

I think we’re basically talking about the same thing. The data structure here would be the read model for let’s say a list of posts, where the number of comments is equally as important as the title of the post, for example. That is why I would like both to exist together in the same map, and also why I wouldn’t expect to be able to write to the comment count.

Although, at that point, just writing the entire select clause manually may make the most sense.

Not sure what you mean, why must “the schema will still need to have the key” be true? select_merge explicitly deals with just adding extra data on top of already loaded data. Seems to cover OP’s case and I am not sure I get what’s the reservation against it.

OK, but you can explicitly take care of it. Just select_merge in the right place – or move the field to someplace else in some final function, called before the data is returned to the caller.

Because merging a key that does not exist on the schema raises an error. This is intended behaviour, since the documentation mentions it as well:

If the map on the left side is a struct, Ecto will check all of the field on the right previously exist on the left before merging.