Fighting `Internal Server Error` on machine suspend/resume cycle

Has anyone built a Phoenix / Ash / LiveView app that supports server suspension, without getting a nasty user experience in the process?

Every time our server suspends, whatever view I’m on turns into a big white page with:

Internal Server Error

Hitting refresh brings it back to life, sometimes you have to refresh twice, but it more-or-less picks up at a reasonable spot (not necessarily exactly where you were, but close enough).

The problem, of course, is the unpleasant “Internal Server Error.” I want that to go away.

I’m not even sure it’s possible, really – in which case, we leave a machine running all the time. But hoping someone has some experience on the topic.

Here’s a look at the logs as the app goes through a suspend/resume cycle. I’m not running debug level yet, so not a lot of info here:

2025-09-23T07:39:00Z app[2860103b762e38] iad [info]07:39:00.761 [error] Postgrex.Protocol (#PID<0.4659.0>) disconnected: ** (DBConnection.ConnectionError) tcp recv (idle): closed
2025-09-23T07:39:00Z app[2860103b762e38] iad [info]07:39:00.765 request_id=GGfKbJC6d6JrX-QAAAjR [error] ** (DBConnection.ConnectionError) tcp recv (idle): closed
2025-09-23T07:39:00Z app[2860103b762e38] iad [info]    (ecto_sql 3.13.2) lib/ecto/adapters/sql.ex:1098: Ecto.Adapters.SQL.raise_sql_call_error/1
2025-09-23T07:39:00Z app[2860103b762e38] iad [info]    (ecto_sql 3.13.2) lib/ecto/adapters/sql.ex:996: Ecto.Adapters.SQL.execute/6
2025-09-23T07:39:00Z app[2860103b762e38] iad [info]    (ecto 3.13.3) lib/ecto/repo/queryable.ex:241: Ecto.Repo.Queryable.execute/4
2025-09-23T07:39:00Z app[2860103b762e38] iad [info]    (ecto 3.13.3) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
2025-09-23T07:39:00Z app[2860103b762e38] iad [info]    (ecto 3.13.3) lib/ecto/repo/queryable.ex:145: Ecto.Repo.Queryable.exists?/3
2025-09-23T07:39:00Z app[2860103b762e38] iad [info]    (ash_sql 0.2.93) lib/aggregate_query.ex:119: anonymous fn/5 in AshSql.AggregateQuery.add_single_aggs/5
2025-09-23T07:39:00Z app[2860103b762e38] iad [info]    (elixir 1.18.4) lib/enum.ex:2546: Enum."-reduce/3-lists^foldl/2-0-"/3
2025-09-23T07:39:00Z app[2860103b762e38] iad [info]    (ash_sql 0.2.93) lib/aggregate_query.ex:81: AshSql.AggregateQuery.run_aggregate_query/4
2025-09-23T07:39:01Z app[2860103b762e38] iad [info]07:39:01.609 [error] Postgrex.Protocol (#PID<0.4660.0>) disconnected: ** (DBConnection.ConnectionError) tcp recv (idle): closed
2025-09-23T07:44:54Z proxy[2860103b762e38] iad [info]App waste-walk has excess capacity, autosuspending machine 2860103b762e38. 0 out of 1 machines left running (region=iad, process group=app)
2025-09-23T07:44:56Z app[2860103b762e38] iad [info]Virtual machine has been suspended
2025-09-23T07:58:46Z proxy[2860103b762e38] iad [info]Starting machine
2025-09-23T07:58:46Z app[2860103b762e38] iad [info]2025-09-23T07:58:46.782463043 [01K5SPF8VBK4G6FEJTKMM9PZ4V:main] Running Firecracker v1.12.1
2025-09-23T07:58:46Z app[2860103b762e38] iad [info]2025-09-23T07:58:46.782590762 [01K5SPF8VBK4G6FEJTKMM9PZ4V:main] Listening on API socket ("/fc.sock").
2025-09-23T07:58:46Z app[2860103b762e38] iad [info]2025-09-23T07:58:46.782800962 [01K5SPF8VBK4G6FEJTKMM9PZ4V:fc_api] API server started.
2025-09-23T07:58:46Z app[2860103b762e38] iad [info]2025-09-23T07:58:46.784474288 [01K5SPF8VBK4G6FEJTKMM9PZ4V:fc_api] The API server received a Get request on "/".
2025-09-23T07:58:46Z app[2860103b762e38] iad [info]2025-09-23T07:58:46.784495308 [01K5SPF8VBK4G6FEJTKMM9PZ4V:fc_api] The request was executed successfully. Status code: 200 OK.
2025-09-23T07:58:46Z app[2860103b762e38] iad [info]2025-09-23T07:58:46.784820917 [01K5SPF8VBK4G6FEJTKMM9PZ4V:fc_api] The API server received a Put request on "/logger" with body "{\"log_path\":\"logs.fifo\",\"level\":\"info\"}".
2025-09-23T07:58:47Z runner[2860103b762e38] iad [info]Machine started in 343ms
2025-09-23T07:58:47Z proxy[2860103b762e38] iad [info]machine started in 352.485279ms
2025-09-23T07:58:47Z proxy[2860103b762e38] iad [info]machine became reachable in 9.492925ms
2025-09-23T07:58:47Z app[2860103b762e38] iad [info]07:58:47.099 request_id=GGfZp_tRZwZy9CcAAAFC [info] GET /sign-in

I think the question is, why is the server suspending if there is still a socket connection open :thinking:

Hmmm…

Well, my reasoning there would be… (btw, I’m using Fly.io and their autosuspend, and I don’t know exactly how they determine it’s ok to suspend…):

  1. I’m keeping the socket open (keep my browser open)…
  2. But, zero activity over the socket (is there a ‘keep alive ping’ on a socket? I’ve no idea)…
  3. So server autosuspend sees zilch traffic, zilch CPU, decides to passivate…?

Which potentially makes sense. If someone goes home at the end of the day, but leaves their browser open, shouldn’t the server eventually suspend? (Probably).

But all that reasoning brings me around to, is there a way to control timeout on the socket and then do something (redirect back to login page or something like that)? I’m not an LV expert – don’t know what options there are.

In my ideal (probably unreasonably so) world, there’d be a way for the server to suspend mid-stream, yet still recover when it wakes up at exactly the same spot, no user interruption. I suppose my second-best option would be, get a “hey, server is about to suspend…” event from somewhere and do something about it.

/edit/
Fly.io does have this interesting tidbit on their blog, but it’s not quite what I’m looking for. First, don’t really want to shut down the app (I want it to maintain state). Second, this probably wouldn’t work, since it’s looking for zero connections as a trigger, while I’m trying to gracefully suspend when there are (passive) connections…

/edit/
One more observation… for whatever reason. iOS seems to handle this better than Arc (Chrome) desktop. Arc gets the “Internal Server Error” while iOS does an automatic refresh, and reconnects gracefully. A few times I’ve gotten the “trying to reconnect” flash on iOS, but again, graceful. grrrrr. Why is Chrome being bad. :frowning:

There are heartbeats on the socket. They’re needed to detect connection issues where the connection thinks it’s fine, but no data makes it through.

1 Like

I have a feeling LiveView + auto-suspending the server will bring you a lot of headaches, probably not a good fit.

One of the hardest parts of working with LiveView is managing what happens on disconnection and reconnection, how the UI is affected, ensuring a seamless user experience.

Intentionally disconnecting the server will lead to lots of such reconnection states, which either means you’re going to learn all the tricks to make it work great, or users will have a bad experience with unexpected page behavior.

The “live” in LiveView is the assumption that you want a realtime connection between server and client, such that either server and client can send events to each other anytime. If there’s no connection between them, then the server cannot initiate a reconnection on its own.

1 Like

Thanks… appreciate the feedback. For now, I’ll make sure at least 1 server stays awake.

Maybe long-term… since our app has periodic usage cycles with long pauses in-between, I’ll look into a possible solution. I think it would have to be “driven” by the app though – meaning, if the app decides “hey, I’ve been idle a while,” maybe it could trigger passivation.

Either that or talk with Fly about why they are passivating a server with active websocks… :wink:

Hey @zac

If at some point in time you go from N servers down to 1 (or really just any of the servers suspend), then a fraction of your connected users will be impacted.

I understand you’re only considering this idling behavior as a cost saving mechanism? If that’s the case, consider whether a VPS on Hetzner, Digital Ocean, etc would serve your needs at a lower price point. I know choosing a vendor here is not trivial, and will affect deployments, existing contracts and business relationships, …

Yeah, I would be curious to hear what their support would suggest. AFAIK they are users of Elixir and Phoenix LiveView, so there must be in-house expertise to orient you.