Select fields from nested preload query

seanyutah · April 29, 2022, 6:01am

I have a classic nested comment schema where only one level of children is needed.

  schema "comments" do
    field(:message, :string)
    field(:removed_at, :utc_datetime)

    belongs_to(:post, Post)
    belongs_to(:user, User)
    belongs_to(:parent, Comment)
    has_many(:children, Comment, foreign_key: :parent_id)

    timestamps()
  end

I’m working on the query to fetch the comments for a given post in a nested fashion. The query looks great, with the user data joined and only making two queries for the two levels of comments. (only preloading children at the top level)

    children_query =
      from c in Comment,
        join: u in assoc(c, :user),
        order_by: [desc: c.inserted_at],
        preload: [user: u]

    query =
      from c in Comment,
        join: u in assoc(c, :user),
        preload: [user: u, children: ^children_query],
        where: c.post_id == ^post_id,
        where: is_nil(c.parent_id),
        order_by: [desc: c.inserted_at]

    Repo.all(query)

The structure being returned looks great and all the data I need is there. But now I would like to pare down that data by selecting only what I need.

The only way I could figure out how to do that is by mapping over the lists of comments and picking out the attributes. I can’t leave the comment structs to the jason encoder because I want the children for the top level comments but not for the nested comments. (so if I include :children in the @derive {Jason.Encoder... for comments it breaks on the nested comments because children are not preloaded)

So is there a way to select specific fields when dealing with nested, preloaded data?

The mapping and picking doesn’t seem like a performance issue but nested map seems barbaric after such an elegant job with the preload. Here is what I am currently doing:

      Comments.get_by_post(post_id)
      |> Enum.map(fn c ->
        %{
          children:
            Enum.map(c.children, fn c ->
              %{
                id: c.id,
                message: c.message,
                inserted_at: c.inserted_at,
                username: c.user.username
              }
            end),
          id: c.id,
          message: c.message,
          inserted_at: c.inserted_at,
          username: c.user.username
        }
      end)

benwilson512 · April 29, 2022, 12:56pm

As a comment about your current queries, I don’t think preloading with joins is the right call here. If a user makes 3 comments, then that user’s information is sent from postgres 3 times, and Ecto has to deduplicate. Using a normal preload([:user]) without the join will make 2 SQL queries, but they will each be smaller and more efficient.

Yes, and this will be even easier if you do simple preloads:

user_query = from u in User, select: struct(u, [:id, :username])
comment_query = from c in Comment,
  select: struct(c, [:id, :parent_id, :inserted_at, :user_id]),
  order_by: [desc: c.inserted_at]

query = from c in comment_query,
        where: c.post_id == ^post_id,
        where: is_nil(c.parent_id),
        preload: [user: ^user_query, children: {^comment_query, [user: ^user_query}]

Repo.all(query)

You may need to still map over things in order to transform it into the shape you want, but either you’re doing that or the DB is doing that, and it’s generally best if your app does it since the database is the more limited resource.

seanyutah · April 29, 2022, 2:22pm

Thanks for the reply, Ben. That’s interesting that the separate user queries would be faster than the joins. If the users are different for every comment (which will be typical) and I can select for specific user columns in the two joined queries, I’d imagine the joins would be faster, and leave less logic for elixir. Either way…
Sorry, I should’ve been more clear. The focus of my question is really the :children. With the issue being that I can’t choose to encode or select the :children from the Comment struct at the top level but also omit it when nested. It’s frustrating that I’m logging the ecto output and it’s precisely what I need, but there’s no way to get what I want and omit what I don’t without looping over the lists.
I’m wanting something like “encode all values unless they are NotLoaded” … or “ability to select virtual fields” or ??

benwilson512 · April 29, 2022, 5:01pm

Oh, I think this is just a misunderstanding about best practices then for JSON encoding. It is NOT considered a best practice to use @derive for doing encodings for an API. Instead you should be using JSON views. This post all the way back from 2015 is still relevant: https://rokkincat.com/blog/2015-03-17-json-views-in-phoenix/

seanyutah · April 29, 2022, 9:37pm

Yea it makes sense that using @derive isn’t a best practice. I’m seeing the limitations first-hand. I looked into Phoenix views and didn’t see anything that would allow me to get what I want without nested loops. Do you know of anything? Can you agree that it would be nice to specify/select the children for the top level and omit it from the nested comments, therefore avoiding the need to loop?

benwilson512 · April 30, 2022, 12:06am

Isn’t that what my example did? I built a query that selected values on comments, and then reused that at both levels without having to specify it twice.

seanyutah · April 30, 2022, 2:39am

Your code outputs the same ecto structure as mine in that the top-level comments have children preloaded and the nested comments do not . Like I said, the ecto output isn’t the problem. The issue I’m wondering about is converting the ecto structs to maps (and then json) without having to map over the top level and nested lists.

benwilson512 · April 30, 2022, 2:56am

The normal thing is to map over them.

03juan · April 30, 2022, 6:57am

Have you considered creating a separate schema for child comments on the same table which doesn’t define the has_many :children? Then you can move the belongs_to :parent over to it and derive each one separately without mapping to ignore irrelevant keys for each context.

seanyutah · April 30, 2022, 1:55pm

Boom! That is a great idea. Thanks Juan!