Defining depth and sorting while preloading a (single-schema hierarchical) join

I have a couple of questions around preloading, please.

I’m using the ‘hierarchical relationships within a single schema’ example in the Schema.has_many/3 examples section for threaded discussions/nested comments, and preloading a few levels of nesting:

  schema "comments" do
    field :votes_aggregate, :integer, [default: 0]
    belongs_to :parent, Comment, [foreign_key: :parent_id, references: :id, define_field: false]
    has_many :children, Comment, [foreign_key: :parent_id, references: :id]
    # ...
  @doc """
  Gets children comments of one or multiple comments.
  def get_child_comments_by_comment_ids(ids) when is_list(ids) do
    |> select([c], map(c, [:id, :parent_id, :content, :votes_aggregate, :inserted_at]))
    |> where([c], c.parent_id in ^ids)
    |> order_by([c], [desc: c.votes_aggregate, asc: c.inserted_at])
    |> preload_comment_children()
    |> Repo.all()

  defp preload_comment_children(query) do
    from c0 in query,
      left_join: u0 in assoc(c0, :user),
      left_join: c1 in assoc(c0, :children),
      left_join: u1 in assoc(c1, :user),
      left_join: c2 in assoc(c1, :children),
      left_join: u2 in assoc(c2, :user),
      left_join: c3 in assoc(c2, :children),
      left_join: u3 in assoc(c3, :user),
      left_join: c4 in assoc(c3, :children),
      left_join: u4 in assoc(c4, :user),
      preload: [
        user: u0,
        children: {
          c1, user: u1, children: {
            c2, user: u2, children: {
              c3, user: u3, children: {
                c4, user: u4

This gives nested lists and maps (some recursive post-processing is done to simplify and sort the maps and to replace the value of not-yet-preloaded children keys with :has_children where there are more-deeply nested children to load via ‘show more’ buttons):

      children: [                                                                                                     
          children: [                                  
              children: [            
                  children: [],
                  content: "Sed quaerat architecto...",                                                                                           
                  id: "ce4a3544-3101-4d60-99eb-f4cb8ada847d",                                                                                                                                                                                
                  inserted_at: ~U[2024-07-04 21:24:50.628226Z],                                                                                                                                                                              
                  parent_id: "4d5bb762-cd8f-44c7-9d2a-3e06fb02a508",                    
                  user: %{name: "brock2048"},    
                  votes_aggregate: 10              
                  children: [
                      children: [
                          children: [
                              children: :has_children,
                              content: "Sunt eum accusamus...",
                              id: "f9b97ce6-a8b5-4876-8790-05f102c1bcdf",
                              inserted_at: ~U[2024-07-04 21:25:15.207419Z],
                              parent_id: "62cff42f-b449-435b-a11f-8e06a0ae7f87",
                              user: %{name: "lesley_murazik"},
                              votes_aggregate: 8

1. Preloading depth

How could the depth of the self-referencing preloading be set to a number instead of being dependent on how many joins and preloads are written out in code, or even to keep going until all nested comments have been loaded?

2. Ordering preloaded results

Less importantly…

I understand I can’t use the schema’s has_many preload_order option because I’m preloading a join in order to avoid multiple database queries. (Github discussion here.) At the moment, I’m re-ordering the preloaded comments in my recursive processing with a couple of calls of Enum.sort_by/3:

    |> Enum.sort_by(&(&1.inserted_at), :asc)
    |> Enum.sort_by(&(&1.votes_aggregate), :desc)

This gives the same ordering of preloaded comments as |> order_by([c], [desc: c.votes_aggregate, asc: c.inserted_at]) does for the main comments query.

I was wondering how complicated it would be to order preloaded results as part of the querying, but I don’t understand how this would be achieved while preloading a join?


Sorry for the non-answer but you could look at how Arbor does it. It uses recursive CTEs without explicit preloads.

1 Like

To expand on your (non-)answer:

If you want to preload to a particular depth, you could probably write some code to dynamically construct the query. I’m not sure exactly how this would look (read: I’m too lazy to go read the docs) but I’m pretty sure it would be possible to keep adding the joins dynamically up to a certain depth. Perhaps you would hit a limit somewhere, I’m not sure.

If you want to do a query to arbitrary depth, you’re going to have to write a recursive query “by hand” (with Ecto) instead of using preloads. These are tricky to write, but it will work (I have done it before). I recommend consulting the docs from both Postgres and Ecto and then reading some StackOverflow answers for examples.

Once you have the rows you are going to have to manually stitch them back into a tree. I am going from memory here but I think the simplest algorithm I came up with for this was to use Enum.group_by on the rows to group them by parent_id and then use a recursive function to place the children into a children field on each parent, recursively, until the tree is built.

That sounds hard but it wasn’t really so bad. Writing recursive queries, on the other hand, sucks.

Oh, and with respect to order: the “stitching” I mentioned, if done correctly, should preserve the original order of the input rows (a property of Enum.group_by). So if you order the rows in the recursive query, everything comes out right. You may need to reason through it to convince yourself of this (I did), but it does work.