LiveStash - persisting LiveView state across WebSocket reconnections

Hi everyone,

We’ve just released LiveStash, a library designed to handle the issue of losing Phoenix LiveView state during WebSocket reconnects.

LiveStash was created to fill the gap between URL parameters and full database persistence. It allows you to “stash” specific assigns and recover them automatically when the user reconnects.

The library currently supports two strategies:

  • ETS Adapter: Keeps the state on the server for minimal network overhead.
  • Browser Memory: Offloads the state to the client, allowing it to survive full server redeploys.

We go into more detail regarding the technical implementation and the distributed systems challenges in our blog post here: The Problem of Reconnects in Phoenix LiveView

Check out the demo and our repo - we’d love to hear your thoughts!
GitHub:

Demo:

25 Likes

Congratulations on the release. What would happen if I open the same page in two browser tabs and, while working in the two tabs independently, the assigns change. Will the stashed sets of assigns overwrite each other? Are different “instances” of the same live view treated separately?

I’ve tried something similar before, however, I am concerned that there could be possible race condition between save and restore, thus perpetual a corrupt state. The client could reconnect so fast, and spin up a new Liveview process before the old one was killed. So, for a brief moment, you could have 2 LV processes, representing the same session, even from the same browser tab.

1 Like

Exactly - stashed state is tied to a particular LiveView instance so there is no interference between different browser tabs.

That’s a valid concern

Fixing that is basically an ownership rule: only the “current” LiveView should be allowed to update the stash. One simple way is to treat the row as owned by a pid (or a small mount/generation id) and reject or ignore writes that don’t match the current owner - ideally with an atomic check so you don’t get read‑modify‑write races.

Actually, we already store a pid in the ETS row, but it’s used for TTL/cleanup, not to block stale writers. Something similar should be done for the BrowserMemory adapter either.

To make ETS table update transactional, you need to block readers, to prevent them from reading partially updated data. It is a big can of worm to implement everything properly.

I’d even argue it’s easier, and most of the time better, to have a sidecar process which monitors the live view process and holds the data that needs to be persisted between reconnects. The sidecar should be registered through a registry so it can be found easier. It should also have timeout after which, when there’s no associated live view process, the sidecar exits.

1 Like

This is exactly what I did in one project. The problem here is that it is hard to generalize: what do you use as the key to the registry? One need to consider different LV, different instances of the same LV, and there is also the multiple tab problem.

What? ETS writes are atomic and isolated, there’s no possibility for a partial read

All updates to single objects are guaranteed to be both atomic and isolated. One process can update 2 objects, and the other process can read in between the writes.

In Liveview, state started from a blank slate and all updates are serialized within the process, so everything should be consistent all the time. If the state is saved and restored without transactional protection all hell break loose.

We actually considered the “sidecar process” approach early on, but decided against it, since spawning an additional sidecar process for every single active LiveView means completely doubling the process count per user. I understand that on the BEAM processes are cheap, but maintaining (potentially) thousands of extra GenServers just to act as a temporary cache introduces significant and unnecessary overhead.

Instead, sweeping stale tuples from a shared ETS table via a single background worker is cheaper than keeping those isolated processes alive purely to wait for a timeout.

You’re right, however this is not a case in LiveStash. If LiveStash were saving each individual assign as a separate row in the ETS table you would be correct. However, LiveStash bypasses this issue by storing the entire state map as a single ETS object.

Would not it make sense to pass {:heir, …} option to :ets.new/2 here?

This table is being owned by a separate process that does nothing but hold it, so using {:heir, ...} is not needed here.

Did you perform any measurements? Statements like these are usually not backed by any data and are based solely on the stigma around spawning processes.

ETS is great if your cache is application-wide but if each piece of data is tied to a particular long-lived process, tying the data’s lifetime to the owner’s lifetime is fault-tolerant while an ETS table is usually not because a single crash wipes the data of all owners.

Also, doubling the process count is the worst case scenario. Maybe not every live view process needs assigns backup. Maybe the stashed assigns are kept in a registry so again in ETS.

As always, there are trade offs.

No measurements were done since for this v0.1.0, the priority was simply validating the idea. We will revisit the architecture and run actual benchmarks later if it proves useful.

When do you store it, at any assign call, or at terminate/1 time? First way is expensive and the second is likely too late. (new LV process can be spawned before the terminate/1 is called on the old LV process)

Storing it at terminate/1 is indeed too late. The state is stored when developer explicitly uses stash_assigns/2 function

I see that you read out the ets row, update the table from the record and put it back. If the process has complex state that amount to megabytes, and the developer is updating one or 2 small keys with stash_assign/2, it feels quite expensive.

Even more importantly, if the developer has:

socket
|> assign(:k1, v1)
|> assign(:k2, v2)
|> stash_assign(:k3, v3)

And then LV disconnected and reconnected, the new LV will have k3, but not k1 or k2. That could be an invalid state. IMHO if you save the entire state at some time, you must save the entire state at all times, or you may have inconsistent state restored.

also between nodes, because if node crash/deploy you will lose the state, or the reconnect is in another node