Possible payload size improvement to HEEX list comprehensions

Jskalc · July 24, 2024, 12:12pm

I agree with LostKobrakai here, and that’s the reason why I’ve started the discussion. Streams are superior in all aspects, but only if you can use them. And you can use them efficiently only if your context gives you diffs. Why?

Let’s consider (IMO) the most common case of using list comprehensions - looping through an Ecto collection. For example:

assign(socket, :users, Accounts.list_users())

Let’s consider one is not using PubSub for publishing the updates / creates / removals (it can get time-consuming to do if you’d need to do it for all the resources), and would like to periodically refresh visible users. The easiest way is to simply run the assign again, eg. in a periodic handle_info:

def handle_info(:refresh, socket) do
  Process.send_after(self(), :refresh, 10000)
  {:noreply, assign(socket, :users, Accounts.list_users())}
end

By default, you’ll send the whole collection back to the client, even if nothing was changed

Ok, so let’s use streams for that case. Sadly we need to write a diff code by ourselves, otherwise streams won’t help with the payload at all. But since we need to do the diff, we don’t save any memory because we need old assigns to calculate the diff. Do you agree @josevalim?

def handle_info(:refresh, socket) do
  Process.send_after(self(), :refresh, 10000)
  users = Accounts.list_users())
  old_users = socket.assigns.users
  
  added = for u <- users, do: ... # calculate added users
  removed = for u <- users, do: ... # calculate removed users
  updated = for u <- users, do: ... # calculate updated users  

  socket = socket
  |> stream(:users, added ++ updated)
  |> stream_delete(:users, removed)
  
  {:noreply, socket}
end

So, for a case like this, nobody will consider using streams, because it’s simply too much work for just optimising the payload.

Yes, we could do that. So we could write something like

stream_update(socket, :users, old_users, new_users)

and hopefully it would work - we wouldn’t get any benefits regarding memory usage, but at least payload would be optimized.

Just, imagine that users is not a top-level assign, eg an ecto preload:

assign(socket, :organization, Accounts.get_organization(preload: [:users]))

If we would like to optimise that, we’d need to introduce top-level stream just for users.

org = Accounts.get_organization(preload: [:users])
socket
|> assign(:organization, org)
|> stream(:users, org.users)

Keeping it in-sync with org.users is additional overhead for the developer. But still, we didn’t yet arrive at the “worst” case:

assign(socket, :users, Accounts.list_users(preload: [:comments]))

So basically we’d like to render a list comprehension within list comprehension. Optimising it with streams is a lot of work, so the vast majority of developers won’t even try. On the other hand, adding two key attributes should be trivial for the developer.

<div :for={user <- @users} key={@user.id}>
  <div :for={comment <- user.comments} key={@comment.id}>
     <%= comment.content %>
  </div>
</div>

And this is also leads me to another point: Streams are not able to optimise payloads of individual items, because they don’t have the previous value. So if we’d consider a stream rendering user like this:

<div :for={{dom_id, user} <- @streams.users} id={dom_id}>
  <img src={user.avatar}/>
  <span><%= user.name %></span>
  <span><%= user.description %></span>
  <span :for={comment <- user.comments} ><%= comment.content %></span>
</div>

If only name was updated, stream will still send all of the dynamic values to the client, since it has no way of determining which values were updated. In my case it’s negligible, but I’ve worked with much more complex collections, consisting of many levels of components. Streams improve the payload by sending less items, but on the other side each item is not optimized.

I think this is the most important problem you see right now, correct? If we could somehow reliably get old keys, both when they’re taken from assigns and from some function calls, then you’d be more keen on supporting that improvement?

Still, doesn’t that issue apply to normal change tracking and is already mostly solved? Eg if someone renders

<div :if={check_condition(@current_user)}/>

then the function is executed each time, and I believe it’s optimised as well (eg. if it evaluated to truthy value both on that render and the previous one coming from __changed__, only the diff will be send?). What happens if the function is non-deterministic and might return different value for the same argument?

To summarize, I think streams are not the one-fits-all solution, because:

you need to have access to diffs, otherwise they optimize only payload size
they’re not able to optimize payloads of individual items
you need a separate assign for each stream (problem if your collection is a field of another assign)
they require a more verbose & careful code (where exactly we want to insert that item? what should be the limit?)
optimizing lists within lists is very hard with streams

Implementing this in Phoenix HEEX engine correctly will be very challenging. Still, don’t you think it might be worth at least to try?