Tips for building resilient frontend apps which use Phoenix channels for backend communication

ravicious · June 20, 2017, 8:51am

When you develop an app which communicates with the backend over HTTP requests, handling unsuccessful requests is straightforward: you send a request and you get a response. Based on the response you can detect whether your request was valid or if the server crashed as a result of it.

With Phoenix channels, it’s a bit different. You still send a message and get a response back, but if your message crashes the channel, you don’t get the response and your only hope is the onError hook, which most probably lives in a different place than the code which sent the message.

As of now, our frontend app keeps the knowledge about the state of the channels it uses (whether the channels have been joined or if they crashed). If one of the channels crashes, the app replaces the whole screen with a big spinner which makes the user wait until the connection is reestablished.

The app has to do that, because the library for handling submitting forms is based on promises. If the message (which was a result of submitting the form) crashed the server, the promise for sending the form data will never get resolved. In this case, we have to destroy the form component to reset its state (otherwise it’d be stuck at “submitting” state).

Now, I’m not particularly fond of this solution. Mainly because the onError hook is called not only when the channel crashes, but also when the connection is abruptly stopped – that happens very often, especially when you’re on mobile and you open the app in the browser and then close the browser and open it again.

In short, I’d rather have my app to deal with errors when they happen as a result of user action (like in the HTTP example) rather than having a big wooly error handler (the spinner) for the whole app. I don’t want to disrupt the user experience if the connection is temporarily dropped and I still want to precisely handle server failures, but I don’t quite know how to approach that with the available tools.

Do you have any tips on how to design frontend apps to handle such error cases? Should I change my thinking and get over the fact that the semantics of HTTP requests and Phoenix channels are vastly different?

I already read the docs about channels and the source code of the frontend client.

outlog · June 20, 2017, 9:34am

when you push to the channel you will receive ‘ok’, ‘error’ or ‘timeout’ much like http, so that api call in it self is pretty similar.

callApi(apiName: string, body: any): Observable<any> {
  return Observable.create((observer: Observer<string>) => {
    if (this.channel) {
    this.channel.push(apiName, { body: body || {} }, 5000)
      .receive('ok', (msg) => { observer.next(msg); })
      .receive('error', resp => {
          return observer.error(resp);
      })
      .receive('timeout', () => {
        return observer.error('timeout');
      });
    } else {
      return observer.error('no_channel');
    }
  });
}

How you handle the errors is also similar and depends on the UX, for GETs I usually do auto retry on error/timeout and show non blocking UI(inline spinner or what not), but the user can still nav away (of course cancel the retrys in that case). For POSTs I show an alert that there was an error and that the user should retry.

Handling the channels should be automatic in my experience - I don’t handle onError for the channel, same for the socket. Once you are connected they should reconnect if anything happens. If I crash the channel the client just rejoins almost immediately.

It of course depends on your app, a live game or data stream - you would start to handle the channel down scenario - but for these ‘api calls’ on channels the above works fine, and is handled identically to http requests.

I usually build it so you can easily swap an api call and chose to run it over http or websockets, just imagine the above function as callhttpApi, it would look pretty much the same.

So you can use channels in a very similar way as http requests.

Of course the real fun starts when you push data from the server, but all that depends on the client implementation and how data is used/stored/rerendered etc.

aseigo · June 20, 2017, 10:02am

Are you processing these forms directly in the channel process?

ravicious · June 20, 2017, 12:41pm

Right, but if the message you pushed crashes the server, you won’t get error, but rather the onError handler would be fired. After a few seconds, you’d get the timeout response. However, receiving the timeout response doesn’t necessarily mean that the channel has crashed. So you couldn’t really return an error from the observer in that case to simulate HTTP 500 response (or you could, I’m not fully familiar with observers, but anyway I use promises in my implementation ;>).

This got me thinking – what if the channel is down and the user submits the form? Would it return an error immediately or would the frontend Phoenix client buffer the message and send it once the channel reconnects as described in the docs? It seems like it’d do the latter, but I don’t know how “a crashed channel” relates to not having an available connection.

ravicious · June 20, 2017, 12:41pm

I’m not entirely familiar with Phoenix, as I’m not the person who writes the backend. However, from what I can see, the submitted data is processed inside the handle_event function in web/channels/foo.

aseigo · June 20, 2017, 12:57pm

In such cases I tend to split these kinds of things into two pieces: the part that does the communication to the client, and the part that does the business logic.

I put each in their own process, with the idea that if there is any failure in the communication process (the channel, in this case) it must mean something went quite wrong with the communication link (e.g. the network) itself and there is not much that can be done about that. The client must always be prepared for such events, so in case of failure, that’s ok.

Then the business logic part, which is where it is more common to experience bugs and recoverable errors, is run in a process separate, and therefore isolated from, the communications part.

The business process (in your case the form processor) is then monitored by the communications process (using the process monitoring that comes bundled with OTP) so if anything goes wrong there, the communications process can report back to the client or, if appropriate, try again. Often I allow some sane number of failures such as 3 before completely giving up. This also gives you a place to log the error safely, perhaps even storing the raw request for later examination (e.g. for bug fixing purposes).

This also has several additional nice side-effects: The business logic process can operate sequentially / synchronously, and the comms process can be updating the client with ping’s to keep the connection alive even when the process takes a long while. The business logic process can bang off progress messages to the comms client which can pass them on appropriately. The communications process can handle multiple requests in parallel (by spawning more business logic processes). etc, etc.

Sooo … tl;dr -> considering splitting out your business logic from the channel itself and run it in its own process

outlog · June 20, 2017, 1:43pm

Right, but if the message you pushed crashes the server, you won’t get error, but rather the onError handler would be fired. After a few seconds, you’d get the timeout response. However, receiving the timeout response doesn’t necessarily mean that the channel has crashed. So you couldn’t really return an error from the observer in that case to simulate HTTP 500 response (or you could, I’m not fully familiar with observers, but anyway I use promises in my implementation ;>).

some basics, the entire server should never crash, parts of it might - depending on the supervision tree, which will also bring crashed parts back up (in usual config).

As soon as the channel crashes, it comes back up, so the channel is never permanently down (well actually after a few tries the supervisor might give up on it depending on your config), but then a client will request a join and off it goes again(I believe).

as the channel works as your transport, I would not do crashy stuff there, spawn a process and if that process crashes the channel can return an error to the client.

so even though elixir/BEAM is about letting stuff crash, I would be somewhat protective of the channel and make sure it’s spawned processes that are crashing (and use try/rescue etc).

This got me thinking – what if the channel is down and the user submits the form? Would it return an error immediately or would the frontend Phoenix client buffer the message and send it once the channel reconnects as described in the docs? It seems like it’d do the latter, but I don’t know how “a crashed channel” relates to not having an available connection.

this is very edge, ‘channel down’ is weird as stated above, but say we have been connected but are now offline, and we push a msg - the pushed msg will be queued up for the duration of the timeout you set in the push message call this.channel.push(apiName, { body: body || {} }, 5000) (or the default 10000) - so if there is a rejoin within the 5 secs that message will be sent to the channel, otherwise it gives you a ‘timeout’.

so it all depends on your use case, you can easily indicate socket/channels status in the client, and even prevent the client from making calls unless things (appear) up etc.

but I don’t believe having the channel crash on purpose is a good pattern.

but I would not be overprotective either, say the entire system runs on postgres and the DB is down, then by all means let some Repo(db) calls crash the channel as well - in such case all bets are off.

It all depends on what guarantees your system requires.

ravicious · June 20, 2017, 1:51pm

Thanks, pushing the business logic to a separate channel (edit: a process, not a channel) sounds very interesting!

It won’t help with the case of having a mobile browser that disconnects with the app, but that’s where the outlog’s advice comes in – the client should be able to rejoin the channel.

outlog · June 20, 2017, 2:23pm

yeah, that would be a separate process, usually here in elixir land a GenServer.

All depends on what you are building.

I would worry less about the channel crashing, and more about the myriad of offline, reconnect, rejoin scenarios - and figure out a suitable logic and client code for all those. Also channel logic for when a user leaves the channel/browser and reopens etc…

  def handle_info({:EXIT, _pid, reason}, socket) do
    IO.puts "#{reason} LEFT: #{socket.assigns.user_id}"
    {:stop, reason , socket}
  end

  def terminate(reason, socket) do
    IO.inspect reason
    IO.puts "terminate: #{socket.assigns.user_id}"
    { :ok, socket }
  end

feel free to ping back as you get further in the development.

dom · June 20, 2017, 11:26pm

Are channel crashes common in your app? I normally treat a channel crash like an HTTP 500, it means there’s a problem with the server code that should be fixed.

ravicious · June 21, 2017, 11:46am

They are not. However, my point is that you can’t really treat them the same as HTTP 500 responses, because usually the handler for the channel crash is far outside of the code which sends the faulty message to the backend.

outlog · June 22, 2017, 8:13am

observables should be able to handle that case, and I really suggest you give rxjs a look http://reactivex.io/…

if you make an observable on the channel.onError and make the request an observable you can ‘race’ them - and if you receive an onError during the ‘race’ you will get that ‘first’ and then you can handle it - cancel the pushed msg, tell the user the channel is offline etc.

I’m just diving into it myself, and the learning curve is quite steep - but rxjs is the tool for handling these ‘event’ streams that a websocket/channel is.

I’m even using https://redux-observable.js.org - to handle client side state and different events coming in (and especially all the side effects of these), this also gives a nice “internal api” of events (and in one place, not spread out all over the code) and you can match them on the server.

It is a bit complex, so will take some time to pick up, I’m still trying to master it all.

hubertlepicki · June 22, 2017, 8:24am

This is interesting. I would love to see some code / read blog post…

ravicious · June 22, 2017, 8:55am

I was thinking about the exact same solution, but using redux-saga. However, it’s requires a big upfront rewrite and I can’t roll it out incrementally, as most of the backend communication in the app is already based on redux-thunk.

It seems like most of the libraries out there kind of assume this request-response model of HTTP requests. This can be easily modeled by callbacks and promises, but Phoenix channels – not so much when it comes to error handling.

outlog · June 22, 2017, 12:47pm

redux saga, would also get the job done yeah.

redux-observable is FP and rxjs is first class citizen on other platforms like angular/ionic, so that’s why I’m putting in the effort as I can use it outside of react, and you know FP;-).

I read somewhere that redux-saga could be viewed as an ‘interim solution’ to the problems while redux-observable was ‘the solution’ - but that might just be biased BS - I’m in no way expert enough to weigh in on this, either should be fine.