Background: In response to the l unrelated question about TreeView in LiveView a lively yet off-topic debate about the pros and cons of using streams was in progress. This is my attempt to give the highly relavent streams discussion a more appropriate home and share my thoughts on that as a point of departure.
Summary: Amidst general consensus about the virtues of LiveView @garrison warned against nested Streams on the premise that Streams negate the declarative (later re-labelled to React-like) heart of LiveView which forced him to turn to imperative (later relabelled to jQuery-like) code to handle the intractible amount of boundary conditions required to accomodate cascading changes.
Context: For the purpose of this discussion let’s abstract LiveView as: a declarative mapping between structured server data and HTML where event handling is (by default) rigged to relay to server. When (in response to a relayed event or a change to an active subscription) a change in the data is detected the changed part of the data is sent to each affected client (session) which calculates the impact on the DOM (based on definitions extracted from the declarations) which are then passed onto pre-written JS code in each client to patch the DOM.
In that context, Streams in their native use-case are (often partial / paginated) lists of Ecto Schema structs for which LiveView (server-side) is able to determine how additions, deletions and changes to individual structs should impact the DOM.
Workload: For simple lists of structs sourced straight from an Ecto query with limits and offsets it is straight forward to determine the changes in order to send only the changes to the client. The workload on the server and clients are of complexity O(n) where n is the query window size rather than the total number of records on file.
Enter the Dragon: When the underlying data changes from “a (section of a) long list of small, independent structs” to “a short list of very large (deeply nested and/or recursive) structs” the change detection and handling algorithms are bound to see a change to any descendent structure as a change to the parent, and as such most of the root structures in the stream appear to have changed requiring them to be sent down to processing to be dealt with. That’s (clearly) not a desirable outcome.
The real issue: For complex structures (in the sense of having many associations preloaded for the presentation layer, generally referred to as nested structures) the current change detection rules are justified. The issue arise when dealing with recursive structures, i.e. where the same constellation of associated schemas re-occur in the data an arbitrary number of levels deep. This way it can easily happen that the entire contents of a database rolls up into a single root structure. Put that root structure in a stream and every client gets sent the whole database every time someone sneezes.
A derivative problem: In the thread leading up to this a somewhat heated debate arose from blaming Streams for forcing users to write an impossible amount of special case code to counteract its intrinsic change detection and handling logic. It’s my personaly impression that the member holding Streams responsible for that appear to have been attempting to write those intervention and special cases either on the client itself or in some other way at an inappropriate level of abstraction. I might be wrong about that but even if I’m not I confess to having great empathy with the struggles related to streaming recursive content. I just don’t want this discussion from getting distracted by that particular (potentially misguided) set of challenges.
What to discuss: I believe there is a valid and relevant discussion to be had about different approaches for managing the LiveView presentations of indefinitely recursive data. The need to mitigte against runaway recursion is obvious, but once we’ve gained control over that, the objective is to enable LiveView and Streams to detect and address changes at the level of recursion where they happen and nowhere else.
Why? My application’s data is modelled as indefinitely recursive data, actually several aspects of it follow their own independent indefinitely recursive structure, so there is nothing hypothetic or theoretical about this for me. It’s a real and pressing issue.
My current approach: As such, I’ve have to look into ways to mitigate against false positives in terms of LiveView chance detection with or without streams. I can summarise my current approach as streaming MapSets rather than native Ecto schema structs. It works well where I’ve implemented it for a subset of data but it’s not yet suitable as a general pattern to apply to my primary data where the consequences of getting it wrong are far more grim.
Some Ideas: As a general principle (MapSet does something similar but it might require something custom) I see a useful but under-utilised correlation between a tree or even graph of related records and a stream of records. An array of related nodes seems to be an established way to represent of a tree or graph on file or in memory. We already know that Elixir’s clostest approximation of arrays, List, is really a (double?) linked list in memory. By implication we could derive a robust mapping between a recursively associated schema and a linked list traversing its nodes in depth-first order.
The case, where each node equates to a single struct with an identifiable parent_id pointing at the parent node, is trivial to specify and implement, but that’s not my reality.
The recursion in my data is technically indirect recursion, i.e. the relationship between two structs of the same schema is through a struct of another schema. It might be slightly more challenging to specify to some implementation code what should consitute one level of recursion but based on my own experiences it’s fairly easily achieved using preload semantics. Basically you can express the definition of an arbitrarily complex recursive node structure as a preload specification which references the same schema at its base and the deepest level of on of the preload chains.
With reference to Gall’s Law (which featured in the original debate) I have dabbled enough with procedural implementation along these lines to be confident about the feasibility of achieve a declarative implementation.
But then: Without a reliablly paginatable data source begind it, the Stream value proposition is severely limited. While it is possible to preload the data to an arbitrary (yet controlled) depth first and then apply this tree-list mapping algorithm to extract the data for the stream, there is another option too. I personally had zero success trying to get recursive_ctes and with_cte working in Ecto. But the PostgreSQL query construct it is based on (similar constructs exists in other databases I’ve worked with slightly different terminology and semantics) returns a one-dimensional recordset which corresponds directly with the list representation of the recursive data we’re looking to not only preload but stream as a list of independent nodes. I see an opportunity in a possible declaritive implementation of recursive streams to generate the requisite recursive cte at (or closer to) the Ecto level, load the result into the list-representation first and then run the mapping algorithm to patch the associations in the tree/graph view to point to the same nodes.
Objective: There’s still a lot to consider, but I’m excited by the prospect of doing this eloquently enough to make it useful without needing any changes to the Ecto schemas and context app code involved. All existing code should be able to function as they do right now and there should be a “new way” of doing things in future. The only impact should be that it should become possible to specify that a stream contains recursive data with a node structure given as a preload expression, and be assured that change detection and propagation will be as efficient as they are for non-recursive data.