Using streams with recursive and/or deeply nested schemas

Background: In response to the l unrelated question about TreeView in LiveView a lively yet off-topic debate about the pros and cons of using streams was in progress. This is my attempt to give the highly relavent streams discussion a more appropriate home and share my thoughts on that as a point of departure.

Summary: Amidst general consensus about the virtues of LiveView @garrison warned against nested Streams on the premise that Streams negate the declarative (later re-labelled to React-like) heart of LiveView which forced him to turn to imperative (later relabelled to jQuery-like) code to handle the intractible amount of boundary conditions required to accomodate cascading changes.

Context: For the purpose of this discussion let’s abstract LiveView as: a declarative mapping between structured server data and HTML where event handling is (by default) rigged to relay to server. When (in response to a relayed event or a change to an active subscription) a change in the data is detected the changed part of the data is sent to each affected client (session) which calculates the impact on the DOM (based on definitions extracted from the declarations) which are then passed onto pre-written JS code in each client to patch the DOM.

In that context, Streams in their native use-case are (often partial / paginated) lists of Ecto Schema structs for which LiveView (server-side) is able to determine how additions, deletions and changes to individual structs should impact the DOM.

Workload: For simple lists of structs sourced straight from an Ecto query with limits and offsets it is straight forward to determine the changes in order to send only the changes to the client. The workload on the server and clients are of complexity O(n) where n is the query window size rather than the total number of records on file.

Enter the Dragon: When the underlying data changes from “a (section of a) long list of small, independent structs” to “a short list of very large (deeply nested and/or recursive) structs” the change detection and handling algorithms are bound to see a change to any descendent structure as a change to the parent, and as such most of the root structures in the stream appear to have changed requiring them to be sent down to processing to be dealt with. That’s (clearly) not a desirable outcome.

The real issue: For complex structures (in the sense of having many associations preloaded for the presentation layer, generally referred to as nested structures) the current change detection rules are justified. The issue arise when dealing with recursive structures, i.e. where the same constellation of associated schemas re-occur in the data an arbitrary number of levels deep. This way it can easily happen that the entire contents of a database rolls up into a single root structure. Put that root structure in a stream and every client gets sent the whole database every time someone sneezes.

A derivative problem: In the thread leading up to this a somewhat heated debate arose from blaming Streams for forcing users to write an impossible amount of special case code to counteract its intrinsic change detection and handling logic. It’s my personaly impression that the member holding Streams responsible for that appear to have been attempting to write those intervention and special cases either on the client itself or in some other way at an inappropriate level of abstraction. I might be wrong about that but even if I’m not I confess to having great empathy with the struggles related to streaming recursive content. I just don’t want this discussion from getting distracted by that particular (potentially misguided) set of challenges.

What to discuss: I believe there is a valid and relevant discussion to be had about different approaches for managing the LiveView presentations of indefinitely recursive data. The need to mitigte against runaway recursion is obvious, but once we’ve gained control over that, the objective is to enable LiveView and Streams to detect and address changes at the level of recursion where they happen and nowhere else.

Why? My application’s data is modelled as indefinitely recursive data, actually several aspects of it follow their own independent indefinitely recursive structure, so there is nothing hypothetic or theoretical about this for me. It’s a real and pressing issue.

My current approach: As such, I’ve have to look into ways to mitigate against false positives in terms of LiveView chance detection with or without streams. I can summarise my current approach as streaming MapSets rather than native Ecto schema structs. It works well where I’ve implemented it for a subset of data but it’s not yet suitable as a general pattern to apply to my primary data where the consequences of getting it wrong are far more grim.

Some Ideas: As a general principle (MapSet does something similar but it might require something custom) I see a useful but under-utilised correlation between a tree or even graph of related records and a stream of records. An array of related nodes seems to be an established way to represent of a tree or graph on file or in memory. We already know that Elixir’s clostest approximation of arrays, List, is really a (double?) linked list in memory. By implication we could derive a robust mapping between a recursively associated schema and a linked list traversing its nodes in depth-first order.

The case, where each node equates to a single struct with an identifiable parent_id pointing at the parent node, is trivial to specify and implement, but that’s not my reality.

The recursion in my data is technically indirect recursion, i.e. the relationship between two structs of the same schema is through a struct of another schema. It might be slightly more challenging to specify to some implementation code what should consitute one level of recursion but based on my own experiences it’s fairly easily achieved using preload semantics. Basically you can express the definition of an arbitrarily complex recursive node structure as a preload specification which references the same schema at its base and the deepest level of on of the preload chains.

With reference to Gall’s Law (which featured in the original debate) I have dabbled enough with procedural implementation along these lines to be confident about the feasibility of achieve a declarative implementation.

But then: Without a reliablly paginatable data source begind it, the Stream value proposition is severely limited. While it is possible to preload the data to an arbitrary (yet controlled) depth first and then apply this tree-list mapping algorithm to extract the data for the stream, there is another option too. I personally had zero success trying to get recursive_ctes and with_cte working in Ecto. But the PostgreSQL query construct it is based on (similar constructs exists in other databases I’ve worked with slightly different terminology and semantics) returns a one-dimensional recordset which corresponds directly with the list representation of the recursive data we’re looking to not only preload but stream as a list of independent nodes. I see an opportunity in a possible declaritive implementation of recursive streams to generate the requisite recursive cte at (or closer to) the Ecto level, load the result into the list-representation first and then run the mapping algorithm to patch the associations in the tree/graph view to point to the same nodes.

Objective: There’s still a lot to consider, but I’m excited by the prospect of doing this eloquently enough to make it useful without needing any changes to the Ecto schemas and context app code involved. All existing code should be able to function as they do right now and there should be a “new way” of doing things in future. The only impact should be that it should become possible to specify that a stream contains recursive data with a node structure given as a preload expression, and be assured that change detection and propagation will be as efficient as they are for non-recursive data.

Before I compose a reply aimed at this discussion in particular, I need to clarify once more that the criticisms of streams I laid out in the previous thread are not restricted to the recursive use-case. I provided a non-recursive example (see my comments about the hypothetical chat app, a canonical streams use-case) to illustrate this.

The reason that these issues manifest particularly strongly with recursive data is that recursive UIs are particularly complex, and all complex UIs built with streams (recursive or not) will suffer because imperative APIs make it difficult-to-impossible to reason about how changes in your state will affect the resulting render.

The reason I brought this up in the context of recursive data is that your previous thread is about recursive data, and so that was the context in which I was writing. Since it seems you have also made this thread about a similar topic, I’m sure it’s going to come up again, but just keep in mind that these issues are not limited to a recursive context.

For a general argument (which is not aimed at LiveView in particular), I will again cite this article:

This, right here, is where the streams approach will go off the rails. The problem is that streams are only capable of performing this mapping one-to-one, but for any nontrivial use case (like a recursive tree), the mapping from state to rendered UI is arbitrarily many-to-many. It is not possible to know in advance, for a nontrivial UI, which parts of the rendered UI need to change when a single struct or field is updated. You have to render the whole thing from scratch to find out.

A simple example is when a node is moved from one position in the tree to another. Hierarchical trees are stored in a relational DB using a parent_id field on each child node, which points back to another node in the same nodes table.

When a node is moved, a single parent_id field is updated. However, in the rendered UI, the situation is very different! We have to remove the node from its old parent and add the node to its new parent. Already we have broken the one-to-one mapping which, as you said, streams provide.

But the pain won’t stop there. First of all, you may end up having to handle removing and re-inserting all of the child nodes of the moved node. I think the streams APIs may happen to save you from some of this work (only because DOM nodes themselves happen to form trees).

Once you start adding more features (your tree probably actually does something, right?), the pain will increase exponentially. A real example from my work: I wanted the nodes to be collapsible, but I didn’t want to render the expand/collapse icon if the node was an empty folder.

Think about how trivial this feature is, and then think about how annoying it would be to implement with streams. You now have to write special-case code that updates the parent of a child when the child is the last child and it is removed from the parent (plus when a first child is added). This is just one tiny feature!

You are very wrong about this. It seems to me that you misunderstood my previous advice and have essentially constructed some sort of strawman to argue against instead (you did this on your last thread too). I can’t allow you to misrepresent my posts like this: what you’ve written here is entirely your own invention.

If you have any questions about what I’ve written you are welcome to ask and I am happy to answer. I am probably one of relatively few people in the world who has actually done the exact thing you are posting about (complex recursive UI in pure LiveView, no client rendering) and I’d be glad to discuss it further.

You should not expect more from it since that is exactly what it is for, just like for normal assigns including record sets in assigns, you provide a mapping between the data and its HTML, and that mapping is in that sense one-to-one.

For normal assigns and streams alike, the code performs the magic it’s been taught to perform which in this case means reflecting on the assigns using the metadata to do the diff and look for opportunities to minimise the updates resulting from changes by re-rendering only the changed parts and sending those DOM fragments to the client. But when it cannot isolate the changes it has to rerender the whole assign, list member or stream element. While that’s usually not too heavy a price to pay, really large assigns, list elements within assigns or stream elements such as what happens with recursive structures will mean that when the code reverts to rerendering the outer structure it has an explosive impact. So the impact is big and the chances of it happening increased because of the complexity of the data. Together that means complex data has a high risk of causing unwieldy updates. I seem to recall that both the documentation and the code when it can detect issues at compile time warns about several things in this regard.

It can for example only pick up on the strcuture of the streamed content if you reference the stream in the prescribred way inside your components.

As another example it warns (with explanations) to not manipulate assigns inside the heex template (but beforehand if you need to).

So yes, absolutely, true and justified, without additional metadata neither LiveView nor Streams can do a whole lot more than it is already doing. It simply hasn’t been taught the additional magic required. When we come with our recursive data, it is up to us to find a way to reduce what we give to tools to what they can handle, and that means it’s up to us to limit what we put in a stream strictly to things for which a suitable one-to-one mapping to HTML has been declared and stick to the rules that allows LiveView and Streams isolate independent changes for which it knows how to propage the changes. I’ve done some of that in my app and you’ve chosen to go about it a different way the actual details of which escapes me still.

Now my quest is to see if maybe Streams can be taught some of the additional magic I’ve been using so that it can handle recursion in stream data more efficiently.

Though it doesn’t qualify as a feature implementation (it’s simply something that has to be there) I happen to have implemented that behaviour recently and it was a complete non-event. In the heex template I render the node either as a (<a>-based) leaf item or as a (<details><summary><a/><ul/></details>-based) node.

Of course that by itself does nothing to address the issue of recursion, but I genuinely do not see where this complex body of special case code you keep referring to is meant to be inserted or what it looks like.

If I’m misunderstanding and/or misrepresenting you it’s without malice and purely because I cannot picture where and in what context or even language these special cases you’re talking about lives. I keep hearing you say that it lives in the frontend and you make frequent references to React, DOM manipulation and even jQuery as the old way to do it. To me that implies the code you’re talking about is hand-written Javascript (or using something like React) but either way running on the client and directly manipulating the DOM. The way I use LiveView and Streams I don’t have the foggiest of ideas how to manipulate the DOM and neither would I want to have one.

The only DOM manipulation in my world is done through what I can let Phoenix and LiveView do by providing them with the appropriate mappings. When I cannot provide such mappings the onus is on me to rearrange the server data so that I have data for which such one-to-one mapping can be defined. Based on what I read in the documentation, articles and this forum, that’s more or less what everyone does and keeps doing until the problem has been broken down into small enough sub-problems to pre-calculate the data in a form for which the mappings to HTML is simple enough for LiveView and Streams to effectively keep track of. It sounds ike you’re doing something else, or while you were using Streams ended up doing something different which broke the assumptions of Stream and forced it to rerender massive chunks everytime a coconut drops.

Between your words and my frame of reference I understood that the way you responded to (or at least tried to for a while) that (Streams resulting in runaway update sizes) was to do (a whole lot of) additional DOM manipulation somehow related to using the Stream API. If that’s not the case by all means correct with what you actually did.

OK, you keep quoting this. Apart from having an issue with throwing articles at people to make your point for you, this article is particularly far outside my range. Just from the title I know that I am in absolutely no position to even try reading the article, it’s just too far outside my frame of reference.

Remember, I’m not an experienced frontend programmer trying my hand at designing and building systems but a experienced Systems Architect (and visionary) forced by circumstance (of my own doing) to produce the frontend my system needs by means whatever most appropriate. I’ve never used React, never installed or wrote a line of jQuery code, try my best to avoid bringing any additional frameworkds like React into Phoenix LiveView equation. I dabbled a bit with the Vue concepts back in the day when I thought I could postpone writing a proper server and build some Vue-based SPA with a backend in Laravel, but then I migrated my efforts to Phoenix and replicated a year’s worth of progress within a week so I pivoted to write a proper backend and produce a frontend coded in the same language.

Which is a long way to say I haven’t read the article and won’t be reading it cause I can’t. You’re the one who understands the artiicle so just tell me what you wanted the article to tell me. Who are “we”, why would react need saving, what makes it worth saving, is it not saved enough by others like LiveView (in your words) having adopted its most useful parts as its own?

Again we’re at risk of wandering off topic. I really don’t want to get caught (again) in a discussion about how bad Streams are, and/or DOM manipulation code. I want to make my data better aligned with Streams so I can utilise its (paging) benefits for the intrinsically recursive data my app needs to present to users in HTML.

Perhaps for you issues with Streams, and because I misrepresent you, you’ll be better off starting your own discussion focussed on that. I don’t want this war of words to distract from the constructive discussion I’m trying to initiate.

The mapping between the state and the rendered template is not one-to-one in any nontrivial case. I think it’s best if I illustrate this in code.

This mapping is one-to-one: one message, one div. If you use streams for this, you can replace the messages with a stream and do a stream_insert/4 whenever a message is updated. This is the canonical use-case for streams, and it will work fine.

assign(socket, :messages, [%Message{...}, %Message{...}, ...])
defp add(socket, message) do
  assign(socket, :messages, socket.assigns.messages ++ [message])
end
# OR
stream(socket, :messages, [%Message{...}, ...])
defp add(socket, message) do
  stream_insert(socket, :messages, message)
end

~H"""
<div :for={msg <- @messages}>{msg.text}</div>
"""

But imagine we change the template to include a “message count” somewhere in the UI.
The mapping between %Message{}s and the template is no longer one-to-one. There are parts of the state (the number of messages) which are an arbitrary function (in this case a count) of the messages.

~H"""
<div>You have {length(@messages)} messages!</div>
<div :for{msg <- @messages}>{msg.text}</div>
"""

The issue with using streams for this is that when you try to use stream_insert, it only knows how to update a single message, but it doesn’t understand that it also has to update the count. So you would have to write what I’ve been calling “special-case code” to handle that specific case. Something like:

defp add(socket, message) do
  if message_is_new?(socket, message) do
    socket
    |> stream_insert(:messages, message)
    |> assign(:message_count, socket.assigns.message_count + 1)
  else
    stream_insert(socket, :messages, message)
  end
end

In the case of our little example the special-case code is still rather simple, but as we add more features the special cases will start to increase exponentially as they will all have weird dependencies on each other. That’s what the article I linked you was about.

So what does this have to do with recursive data?

Well, in the example I gave (the expand/collapse node), we want to render the button conditionally depending on whether a folder has children or not. The problem is that, when we update a child, the stream will not, on its own, understand that it has to update the parent. So we would have to write some sort of special case.

Note that this is pseudo-code, and the real code would be more complicated.

defp update_node(socket, node) do
  old_parent = get_old_parent_node(node)
  new_parent = get_new_parent_node(node)
  if length(old_parent.children) == 1 do
    # old parent will now be empty
    stream_insert(socket, :nodes, %Node{old_parent | ...})
  else
    socket
  end
  # same for new parent, and so on...
end

And then you have to write more code like this for many other features.

However, if you avoid streams and just write this as a normal LiveView, you will not have this problem. This is why I advised you to be wary of streams for this use-case.

I would like to hear more about this. In particular, how are you handling updates when the state (the underlying tree structure) changes?

This is dangerously close to being too far off-topic but let’s see if we can put this to bed quickly.

I don’t know if you just chose a bad example, but from the one you gave at least one problem seems obvious. The count of messages should not be calculated based on stream operations. It’s normal to calculate and maintain the count closer to the context app which set up the stream and other assigns the frontend should render from. “Unread” counts can be expensive to keep extracting from the database but there are other techniques to cache them per user and simply invalidate each affected cache when a message is sent or read. Not saying that what you should do, only that you need to consider the data required to implement these “features” at a higher level with the right tools. Doing it within the stream event callbacks, will drive you bonkers, and probably did.

You’re right, the stream by itsetself cannot figure that out but it will have figured out that your heex code referenced parent.children and mark it as requiring an update. It errs on the side of caution and usually (as you’ve described experiencing) regenerates far too much of the HTML just to be safe. The objective with better support for recursion in streams is to enable the stream version of the diffing LiveView does for assigns anyway to detect changes only where they actually occur and deal with them in isolation. In your example that would mean that even in the complete implementation of a display that depends the presence of children or not would result at most two records will change resulting in new HTML being generated and sent - the newly inserted node and the parent node which now happens to contain that one new node, but whatever it contains doesn’t matter because those additional children are not on the client yet (or else the children count would not have been 0) and has to be sent there. It would be able to detect that the rest of the tree did not change.

I now have a clearer understanding of what you’ve tried to warn me about, and I thank you once again for your kindness. I think I’ll be OK. For the places where I’ve used streams so far I’ve done the right thing and for the bigger problem involving large recursive datasets I soon will be doing the right thing in terms of being stream-friendly.

Unfortunately avoiding streams and using regular assigns is far more likely to expose the user to unacceptable and/or unmanged performance when the data cascades. Regular assigns make even harsher assumptions about the programmer ensuring that the assignes don’t get too big, and if you stick a variable sized assign element in there which preloaded recusrive associations would be your poor users will face a variable if not unpredictable experience. Regular assigns were designed for small variables or simple lists of small variables, or large variables broken down into lists of small or large variables, but the whole variable is processed. But dont worry, I’ll figure out how to trick streams into handling indefinitely recursive structures effectively.

That example is stripped down as much as possible because its purpose is to build intuition (for you). There are a great many ways one could implement this feature but what remains true is that there is a non-linear relationship between the state and the rendered template. If you want to learn more about this you can read the article I linked - I think a deeper explanation would veer, as you said, off-topic.

I’m a bit concerned by this reply. Streams can’t do any of this! The fact that they don’t do this is, in fact, the entire reason I’m warning you against using them.

It is not possible for streams to diff the parent because they do not even hold the parent in memory! It’s deleted after being used to render the template!

It sounds to me almost like you believe streams work the same was as normal renders - if this is the case then it is no wonder you have met my criticism with disbelief.

Funny, it sounds like your idea is converging on what I (briefly) proposed in a past comment as a possible solution. But I think it’s important to highlight two things: first, this has nothing to do with recursion (it’s a general problem with streams). And second, the changes required to fix this would make streams unrecognizable (and surely break backwards compatibility). In other words, it would necessary to call that new API something else.

You are quite right, and this is the problem I had to solve. In my very first reply to you I described how I did it.

To elaborate, there are two performance problems:

The first is that, because LiveView does not properly diff containers (or nested containers), the diffs sent down the socket are excessively large. The correct solution is for LV to diff containers (I think they’re working on it), but as a temporary fix you can wrap your nodes in LiveComponents to shrink the diffs down to constant. This is the performance advice I gave you.

The second is that you have to hold the entire tree in-memory on the server. This is the price of using LiveView and I accept it, so I made no attempt to fix this. If this is a real problem (you have a truly large tree, hundreds of thousands of nodes), then you will have to resort to some very advanced tactics. You could probably virtualize (i.e. paginate, in a nested fashion) the tree on the server. Note that you still don’t need streams to do that.

A positive step forward, thank you.

For context, I’m not looking at hundreds of thousands items but hundreds of millions per regional federation server cluster. Every user has theoretical access to a substantial portion of it, but not all at once.

It’s largely small records but they add up quickly, about a quarter the data tracks each user’s effective access rights while the remainder is again split down the middle between structural elements (theoretically graphs but in practice usually trees with the odd loop turning it into a sort of directed graph) and the displayed informational content (not implying the rest doesn’t carry information or doesn’t impact what user see).

I don’t merely wish to avoid keeping all that data in server memory, it would be physically impossible. I have no alternative than to load the data for each user in segments controlled to a reasonable size, still using a starting point but limiting levels rather than rows.

But reality is slightly less dramatic than the picture I paint. While it’s all accurate there are natural and practical limitations in play on several levels which drastically reduce the portion of data loaded from the database and presented to each user. That’s all taken care of by the aligning how and where I store the data with reality, being aware of and able to accommodate exceptions (intra-regional references) efficiently and presenting a user interface which essentially navigates each user through the vast expanses of data using a (comparatively) tiny window or cursor which contains only enough pre-loaded data to avoid excessive delays when drilling deeper or backing out along the path. It’s “classic” infinite scrolling but for recursive/hierarchical content.

Let’s agree for the moment that to stream or not to stream is not the question. The issue at hand is this: Although I’ve reduced the amount of data per user session the server and/or client handles at any time the size of the DOM updates being sent to the client still makes a material difference. I could argue that the worst case scenario, i.e. that for every tiny change the entire active dataset being re-rendered and sent to the client is acceptably small and “shouldn’t be a problem” which seems a fair assumption until you a) consider the impact of undetected runaway data caused by malicious users or by accident and b) consider the combined impact on the servers from a load and traffic perspective. Then it becomes imperative (as in essential and urgent) to implement the rendition of the data and its changes in whatever manner will ensure that as far as can be arranges only the parts that actually changed ever gets sent to the client. LiveView already bares an impressive portion of this responsibility with it’s diffing and tracking of what parts of what assigns determined which portion of the HTML so that using the assigned DOM ids the affected parts of the DOM may be patched.

Now we know that the (as you put it) canonical use-case of LiveView involves linear sets of independent records as obtained from (either pure or streamed) lists provided to the component/rendering function as assigns. To flatten a recursively preloaded Ecto schema into a linear set of records and back again into the nested structures upon use, is rapidly turning from a solvable to a solved problem for me. That takes care of the linear list part. At first glance it seems that by happenstance or foresight the way LiveView identify the parts of the DOM requiring updates create the opportunity to succinctly identify and propagate the patches to each node in the tree (given to LiveView as a list). Ostensibly, (and this could be crux of the matter) it matters not where in the DOM a node’s HTML is placed, it will patch that node’s HTML based on the ID with the re-generated HTML.

If the above were true in every sense it would have simplified matters a great deal. Still might, but it all rests on how LiveView updates the DOM.

Any assumption of independence between elements being given IDs as they are rendered in preparation for DOM updates could mean that when that portion of the DOM is updated it is replaced with the new HTML that just arrived. If that element/node however included other content for which there or don’t need to be newly generated HTML, will the new DOM tree be patched with the other content from the original DOM or will it be discarded? Between what I’ve seen happening myself, what is described in the documentation and presentations and your feedback about streams and their issues, I’m getting lots of mixed signals. I’ve been warned of and have witnessed containers’s existing content hanging about indefinitely after an update. One particular example of this I can remember involved table rows I incorrectly added as headings without the appropriate identifier (data-phx-id) being impossible to get rid of. On the other hand you’re mentioning that (neither standard assigns or stream assigns in) LiveView handles container updates correctly. That’s more than a little confusing. Perhaps (though unlikely) it all true and the diffing and DOM update code in Phoenix works or doesn’t work at random but more likely it is merely opinionated about when it can and should do what. Whether it pertains to bugs, missing functionality or documented caveats, it should be possible to grasp what LiveView would and would not be able to handle under what conditions and then see to it that those conditions are met when writing the code.

Best course of action I can think of right now is to consult with the expert, which will do its own reply so he doesn’t have to suffer through all of the above.

@chrismccord, may we please have your expert (as author) opinion on the following:

When LiveView renders items from a list it may fairly assume that the DOM tree for each item is independent (or separate, non-overlapping). So when a change to such an item if picked up it could replace the old DOM node (with that data-phx-id) with the newly generated HTML for it (the same data-phx-id). That’s all fair and what we experience all day.

If that component, HTML and subsequent DOM tree in fact acts as (probably indirect as in happening some <ul>, <li> <detail>, <table>, or plain <div>s down) a container for what had been rendered from other list elements,

  • would that content get discarded (unless they once again present in the replacement HTML) or
  • could we arrange for such content to be transferred to the replacement DOM tree?

If the above is meant to be happening by default, what conditions would break it and/or determine what gets replaced, discarded or transferred?

If it’s not default behaviour, is it supported through some options?

If it’s not supported at all, would it likely be bugs causing the instances I’ve witnessed where a container ends up with “dangling” content from previous updates?

Also, if it’s evident that I’m approaching this from an angle that’s orthogonal to your intentions with LiveView you’re more than welcom to point that out. I (for one, and at least one other) am building something with LiveView that may well be outside its design parameters.

The objective I’m chasing, so you can TL/DR the rest of the conversation, is implement infinite scrolling - style navigation through a vast recursive dataset and minimise the size of updates sent to clients as much as possible be excluding bogus changes.

I don’t know if Chris will respond to this as well, but let me try to clarify some things.

First of all, LiveView provides a way to render HTML templates on the server and keep a client in sync with those. It does this by splitting the template into static and dynamic parts to only send those parts for subsequent updates that are affected by a change. LiveView’s change tracking works best for small assigns, where you can easily say that they either changed or not. As soon as lists or nested data structures are involved, things get tricky. LiveView has some optimizations for change-tracking maps, but it does not change-track lists. This means that if you do:

<ul>
  <li :for={item <- @items}>
    {item.name}
  </li>
</ul>

using regular assigns, LiveView will re-send all the list items when the list is updated. Even if you actually only append or prepend a single item to the list. There are two ways to address this:

  1. use streams
  2. use a LiveComponent for each item

1. Streams

Streams are meant mainly as an optimization for collections of items - say your typical table that contains many rows. They provide a way to keep those items rendered on the client without the server needing to keep all of them in memory. Because the server does not know which items are actually rendered on the client, changes sometimes do require more coordination, for example to specify at which position an item should be inserted or updated.

Streams work by designating a DOM node as a stream container (marked with phx-update="stream"). On the server, the only special handling that streams receive is that their content is pruned (-> changed back into an empty list) after each render. Let’s look at an example:

<ul id="my-stream" phx-update="stream">
  <li :for={{id, item} <- @streams.items}>
    {item.name}
  </li>
</ul>

When you initially do

stream(socket, :items, [%{id: 1, name: "First"}, %{id: 2, name: "Second"}, %{id: 3, name: "Third"}])

the client will receive the following HTML (we won’t look at the diff structure in detail, but on the client LiveView always reconstructs the full HTML after applying a diff and feeds that HTML to morphdom):

<ul id="my-stream" phx-update="stream">
  <li id="items-1">First</li>
  <li id="items-2">Second</li>
  <li id="items-3">Third</li>
</ul>

The diff also contains metadata for each streamed item (which stream it belongs to, its id, its position in the stream and the stream limit). When you now do something on the server that does not touch the stream, subsequent patches to the DOM basically render an empty container:

<ul id="my-stream" phx-update="stream"></ul>

and the client handles those containers in a special way to preserve all stream items until they are deleted. When you now stream_insert something, the client will receive new HTML including the new item, and together with the metadata, it will know how to properly insert this item as a child in the stream container.

The important part is that stream children are always rendered as direct child of a stream container. So trying to transform a nested structure into a single stream will not work if you cannot also render your structure as a flat list of DOM nodes.

2. LiveComponents

LiveComponents are meant to provide a way to encapsulate components that handle their own events. LiveComponents are not meant to optimize server memory usage. The server will always hold all LiveComponent assigns in memory (ignoring temporary_assigns and streams inside LiveComponents, that can be used for optimizations). They are special in how they affect DOM patching, so that’s why they also play an important role when talking about rendering a list of items. The important part is that each LiveComponent performs its own change tracking. So when having something like this:

<ul>
  <.live_component :for={item <- @items} id={item.id} module={...} item={item} />
</ul>

and you change the @items assign, each LiveComponent will be updated, it calculates its changes and send the necessary diffs to the client. If nothing changed, only the updated order of LiveComponents is sent to the client, as a list of integer IDs. Importantly, LiveComponents in diffs are basically treated as a flat map of %{id => diff}.

LiveComponents are updated in the following cases:

  1. a render is triggered that renders its <.live_component /> tag
  2. send_update is used to directly update a component with new assigns

Questions

Now, before looking at a possible solution for arbitrarily nested data structures, I want to try to directly answer your questions:

LiveView uses morphdom to make sure that the rendered DOM looks like the HTML it generates from the diffs it receives from the server. So LiveView does not “replace” the old DOM node with the new one, but (if possible) updates it in place. This patching applies to all children of a patched element as well. Special handling is in place for nodes with phx-update (streams, ignore, append/prepend).

If the reconstructed HTML from the diff does not contain the nested old content, it is discarded, unless it is rendered inside an element with phx-update="ignore" or inside a nested phx-update="stream".

I will go into this in detail in the next section.

You’d need to provide more details on when you’ve seen “dangling” content. I can think of one instance where this is expected: when you render a non-stream item in a phx-update="stream" container. In this case, such items are rendered and updated in the DOM, but they are never removed unless the whole container is removed. (This behavior is documented.)

A way to provide surgical updates for nested data structures

By leveraging both Streams and LiveComponents, you should be able to achieve what you want: patching only parts of the DOM that are affected by a change. The idea is that you transform your data structure in such a way that lists are rendered as streams of live components, that can themselves contain more streams of live components. If you provide those live components a deterministic ID, that you can also reference when you need to perform a nested update, you can surgically patch the DOM with minimal diff over the wire by using send_update, targeting the individual nested component. Let’s look at an example:

We’re trying to render a treeview representing files and folders. The files are stored in a SQL database where each entry has a type file or folder and each entry has a parent_id that points to the parent folder. When we get this data from the database, we get a nested structure:

%FileEntry{
  id: 1,
  type: :folder,
  name: "root",
  # preloaded with the parent_id
  children: [
    %FileEntry{
      id: 2,
      type: :file,
      name: "foo.txt",
      children: nil
    },
    %FileEntry{
      id: 3,
      type: :folder,
      name: "subfolder",
      children: [
        %FileEntry{
          id: 4,
          type: :file,
          name: "bar.txt",
          children: nil
        }
      ]
    }
  ]
}

This could be nested arbitrarily deep. We’re not required to fully render the tree. Our list_files function could only return a few levels, with further children being not preloaded. Now, let’s look at the code to render this:

defmodule MyAppWeb.TreeView do
  def mount(_params, _session, socket) do
    # this would be the result of a query to the database, so probably more like
    # tree = Files.list_files(parent: nil, depth: 2)
    tree = [
      %FileEntry{
        id: 1,
        type: :folder,
        name: "root",
        # preloaded with the parent_id
        children: [
          %FileEntry{
            id: 2,
            type: :file,
            name: "foo.txt",
            children: nil
          },
          %FileEntry{
            id: 3,
            type: :folder,
            name: "subfolder",
            children: [
              %FileEntry{
                id: 4,
                type: :file,
                name: "bar.txt",
                children: nil
              },
              %FileEntry{
                id: 5,
                type: :folder,
                name: "subfolder 2",
                children: :not_loaded
              }
            ]
          }
        ]
      }
    ]

    {:ok, stream(socket, :tree, tree)}
  end

  def render(assigns) do
    ~H"""
    <ul id="tree" phx-update="stream">
      <.live_component
        :for={{dom_id, entry} <- @streams.tree}
        module={Example.TreeComponent}
        id={entry.id}
        dom_id={dom_id}
        entry={entry}
      />
    </ul>

    <button phx-click="rename_bar">Rename bar.txt</button>
    """
  end

  # assuming updates to individual entries are received via PubSub
  def handle_info({:update, entry}, socket) do
    send_update(Example.TreeComponent, id: entry.id, entry: entry)

    {:noreply, socket}
  end

  def handle_event("rename_bar", _params, socket) do
    # simulate a file event that could also be sent via PubSub
    send(self(), {:update, %FileEntry{id: 4, type: :file, name: "bar - #{DateTime.utc_now()}.txt", children: nil}})
    {:noreply, socket}
  end
end

and then the TreeComponent:

defmodule Example.TreeComponent do
  use Phoenix.LiveComponent

  # subsequent updates don't affect the children by default;
  # those are updated on their own
  def update(assigns, %{assigns: %{entry: _}} = socket) do
    {:ok, assign(socket, assigns)}
  end

  def update(assigns, socket) do
    entry = assigns.entry

    # this is a memory optimization to not keep all children in memory
    socket = case entry do
      %{type: :folder, children: [_ | _ ] = children} ->
        socket
        |> stream(:children, children)
        |> assign(entry: Map.put(entry, :children, []))

      _ ->
        socket
        |> stream(:children, [])
        |> assign(entry: entry)
    end

    {:ok, socket}
  end

  def render(assigns) do
    ~H"""
    <li>
      {@entry.name}
      <ul :if={@entry.type == :folder} id={"folder-#{@entry.id}"} phx-update="stream">
        <.live_component
          :for={{dom_id, entry} <- @streams.children}
          module={Example.TreeComponent}
          id={entry.id}
          dom_id={dom_id}
          entry={entry}
        />
      </ul>
      <button :if={@entry.children == :not_loaded} phx-click="load_children" phx-target={@myself}>Load children</button>
    </li>
    """
  end

  def handle_event("load_children", _params, socket) do
    # this would probably look more like
    # children = File.list_files(parent: socket.assigns.entry.id, depth: 2)
    children = [
      %FileEntry{
        id: 100 + floor(:rand.uniform(1000)),
        type: :file,
        name: "more.txt",
        children: nil
      }
    ]

    socket
    |> assign(:entry, Map.put(socket.assigns.entry, :children, []))
    |> stream(:children, children)
    |> then(&{:noreply, &1})
  end
end

Here’s a single file sample with the full code: LiveView File Tree optimized with Streams + LiveComponents · GitHub

I hope this clarifies things a bit. Let me know if you have further questions!

11 Likes

This is a stunning and thorough reply. Thank you for saving me a lot of time! :slight_smile:

There are a couple of things I would like to add:

First, since your (last) example uses streams, you still need to know which component should receive a given update. In the spirit of my previous comments on this topic I would like to point out that this is not always easy. For example, say you wanted to compute the number of files in each folder (recursively) at runtime (there are legitimate reasons for this). It would quickly become difficult to determine which components should receive an update (the correct answer in this case would be to update all of the parents of an updated file, recursively). There are many more features you might build which require updates across arbitrary nodes, where the computation of that set of nodes is nontrivial.

In this case you could simply use a pure LiveComponent approach and retain the wire benefits (at the cost of more computation for diffs, which is usually a worthy tradeoff IMO).

Second, I want to talk about a pattern I ran into implementing this feature. When updated records come in (over pubsub or from the database), it is necessary to “stitch” them back into the tree structure. As Elixir is functional, this operation is, frankly, annoying. You can avoid this by storing each set of records in a map %{primary_key => struct} and then essentially “joining” them into the tree structure in-memory (this is a form of indirection).

For example, you might have a nodes table which references both files and folders through foreign keys. You might load the nodes into a tree, and then load the full list of files and folders into maps. Then, when you receive an updated %File{}, you need only update the map %{file.id => file}.

Then you can render out a tree yourself containing only the relevant data. For example you might produce a tree like this:

%{
  type: :folder,
  file_count: 100,
  children: [
    %{type: :file, ...},
    ...
  ],
}

And then you can pass that tree to the LiveComponents (which themselves form a tree), and they will nicely diff those attributes for you, all the way down the tree. Note that if you give the components static ids (the primary keys of the nodes, here), you can even move entire subtrees around with tiny updates (a wonderful optimization that comes from LiveView).

Note that you can now recursively sum the file counts up the tree when you compute this “rendered tree”, as well as materialize any changes to the files and folders without having to reload the entire tree (with joins) from the database for every measly operation (like a file rename).

This is very important context, that is an enormous tree! Obviously at that size you cannot fit it in memory on the server, on the client, or probably even in the database (you likely have it on disk).

I think the (excellent) reply above clarifies the performance of LiveView (streams or not), but there’s something else important to highlight here:

With this amount of data, there is no way a user is going to be able to parse (in their head) even a fraction of a percent of the total tree. So what you have, more than anything else, is a UX problem; how do you choose which parts of the tree to present?

I can’t comment on this further without knowing your actual use case, and I’m sure you’ve thought about this already, but there are many possibilities here: pagination, search, and so on.

Only once you have reduced the dataset down to the thousands (and it is therefore comprehensible to mere mortals!) is any of the performance advice here going to apply.

You’d migh have missed that explicitly stated all of this as part of explainging the context. I even brought up much of it in response to your initial criticism of Streams. So yes, all of that is valid observations I’ve long since considered as fundamental to my application design. What is under discussion right now is after all of that is thouroughly recognised and implemented, what remains is a small “page” or window of data actively being presented to the user which usually means a single node from somewhere in the tree being considered the current root of what is being rendered and one would still like to avoid sending the whole tree’s worth of data everything a sparrow coughs.

I’m leaning towards using a MapSet rather than a plain map as the basis by which to achieve what you described as wanting to do with a map but it seems we’re heading in the same general direction. But it won’t be of much help if i (we) cannot arrange for the unchanging children to remain in the container when only the container has changed.

If I understand @steffend’s very eloborate answer (thank you so much) he’s essentially saying that if saving on the network traffic and consequential delays is important enough on can adress the problem by using more server memory to maintain each (presumably) node/level’s re-rendering context as the state maintained by a LiveComponent. That’s what I heard. It’s a way that might would probablyy work (althrough I must study the example more in order to connect with exactly where the it is achieved that the parent is updated with the same children rather than new ones. I’m sure it’s there, I’m just ever so slightly overwhelmed by the given example right now.

Having said that, the prospect of acrificing server memory to order to save the oad of the network and client made me think of another approach with a different sacrifice. In my case, where the current page / window getting rendered as HTML is very well defined as a small set of root nodes, usually one, and a maximum recursion level, it might just be that the cheapest colution in terms of resouces and time is to simply reload the data from database each time and do the diff on that. It might still require a LiveComponent to arange for the right diffing and patching, but it would dramatically reduce the memory footprint of the LiveComponent’s state per user session per node.

That last (per node) is an daring assumption. I’m not sure *and havent been able to tell uite yet) if the entire current page would be rendered in one (stateful) LiveComponent or it would require a LiveComponent for each cluster of associated clusters of nodes representing one node in the recustive pattern.If either of you is particularly clear on that please put me on the right path.

The other unknown to me so far as stated is exactly what/where in the example arrangement has the impact/side effect of allowing the container to be updated without changing the children. My rough guess and understanding is that it’s a consequence of the diffing being done by the LiveComponent finding that the new HTML for parts of it h=is the same, but I’m not sure if that’s right or what controls that. PLus it seems a tall order for that diffing to pick that up several layers deep in the recursive structure.

To clarify that last point abount levels. In my use case the recursion is indirect, meaning that it’s not simply an Ecto struct with a parent_id resulting in each tree having a has_many association that reads like a list of children. In my case a logical recursive node consists of quite a few (order of magnitude 6-10) different records in various associations with the each other, multiple smaller list of children considered acceptibly small to update as a whole every time. But at sime point that constellation of Ecto schemas starts to repeat itself with a single Ecto schema struct on the parent side and a list of the same Ecto schema structs on the other. I’ve made some progress chopping my tree (after pagination) up into these recursive nodes by walking depth first through the tree and at the hand of a keyword list akind to how preloads are specified in order to know when to draw the lines between the content that goes together and the content that should go into the next node. It’s easy to tell sibling nodes from each other because they are typically in a list of the kind at the head of the preload list, and to traverse a preloaded trea is childs play using reflection. The only thing to look out for is when the preloaded structure reaches the recustion point as indicated by where the preload spec returns to one or a list of the same spec as it’s head. That’s what I refer to as a node, and the assumption I’ve trying to check is whether or not using the LiveComponent approach would involce instansiating a LiveComponent instance for each node or once only per user to keep the entire acative section of the tree in state memory.

Anyway, thank you for your time and considered responses. Like observed I’ve a non-trivial use case and I concern myself primarily with ways and means to keep the user experience from overwhelming the user. Part of that involves innovative ways of showing the and allowing the user to control deep he looks and even assign his own summations for things once he’s understood its meaning, but another part of it is to ensure as far as I can that the interface itself appears smooth as a baby’s bottom which means not getting interrupted by spinners and dead screen time. I’m willing to dedicate the server resources required to do achieve that, but must be realistic about it as well. Generally speaking my database servers are rigged for read performance so it isn’t entrely beyond the realm of possibility that reloading could be a better use of server resources than keeping things in memory. Already I’ve done some Ecto testing to compare an actual tail recursive loading algorithm with a single preload using a arbitrarily nested preload statement. For the same dataset the recursively generated list of preloads performed the preload using about half the queries usd by preload using actual recursion.

I wouldn’t expect you to. The application conveys its own use case best, and I’ve learnt not to attempt it in any other way. Suffice it to say that every user starts engaging with the big world out there by focussing only on their own data. Gradually they branch out to engage with other (shared) data fom which they progressively bring how they see it in to their own dataset. The focus is on condensing large volumes of information, once sufficiently explored to form a single word or sentence about it for yourself at that time, to replace that entire branch of the big tree with your take on it, until some later time when something challenges or clashes with your summary view draws you back into the source material and multiple levels of summaries compiled by yourself and others, to look for where you’ve misunderstood . In the end, each other sits with logically the whole tree condensed into words and concepts by which they understand it. From the top level, very level you engage with is ultimately intended to fit on a single page. More than that an you’re encouraged to further reduce it. As humans we can deal with, depending on a few geneti and environmental factors, with 7 plus or minus 2 concurrent thoughts in our minds. That’s not a lot. IDEF/0 took that a little further and made it formally part of the methodology and toolset saying that every activity must be broken down into between 3 and 6 sub-activities. Less than 3 or more than six means you’ve not comprehended the activity at that level of recursion and you need to do better. My rules aren’t that strict but i’ve pretty much adopted a similar notion - there’s no point in this deep hierarchy essentially repeating itself without getting anywhere new, and any list of more than a few items is testimory that you’ve not yet spent enough time abstracting the list into sublists that impact each other and belong together. That sort of thing. Again, I cannot convey a fraction of one percent of the whole concept here or anywhere I’m foced to use language designed for telling stories.

There is (very obviously) a great deal more to it, but that is in a nutshell the general concept by which the user experience compensates for the overwhelming complexity of such a vast tree. For more detail you’d have to wait until you receive your invitation to join the program from someone you know personally know. It may take a while, but it should get to you eventually.

I am unsure of where exactly you are employing these MapSets but the reason I mention using a map is because retrieval by primary key is O(1) and I am essentially performing a hash join in Elixir code. Using a set (or list) would perform much worse and gains nothing in terms of functionality.

That is correct, this is declarative in the “declarative vs imperative UI” paradigm we have discussed at length, and the memory is the price.

This is an implementation detail of your backend but it does not affect the “LiveView to Client” (wire) diffs we are talking about. In order to diff the message to the client, you have to store the old and new states. LiveComponents will do that for you, Streams will not (by design).

Where you put the LiveComponents would determine the granularity of the diffs. For the smallest diffs you want to use them as the elements of lists (:for comprehensions) because LV does not properly diff lists on its own. In practice you would probably use one LiveComponent per node in your graph.

Each LiveComponent must be given a unique ID (unique within the page), and this id allows the LiveView client to send diffs directly to the part of the DOM where they are needed (even if arbitrarily nested).

Thanks @steffend, I wasn’t sure but I finally spotted the place in your code (a.k.a caught your gist :slight_smile: ) where it is explicit that there’d be a LiveComponent per tree node. I’m now weighing up the cost (in overheads) keeping each node state in memory separately vs keeping the displayed windows of the large tree in memory as a whole vs loading it from database as needed.

I need to confirm my understanding. You have the comment:

 # subsequent updates don't affect the children by default;
  # those are updated on their own

and I’m not sure I understand exactly what establishes and/or affect that default behaviour. Is it that LV diffing stops walking the tree with it encounters another (or the same) LiveComponent, or is it because of something your code explicitly does or doesn’t do?

I can see how and that it would work as advertised, and once I’m sure how to control the behaviour, it definitely could also work for me. It’s not the end of my journey though, for the simple reason that once I’m in control of the size of the loaded portion of the gigantic tree I might not need to use streams for pagination and further, leaving only the update vocabulary it provides as reason to use streams. I’ll need to give this more thought.