How to handle failed server launch in a supervision tree?

I have a GenServer that is tied to a tcp connection; if the connection fails it fails init/1. This is a problem because if the other end of the tcp connection (which is a config-assigned ip address) is down it will prevent the entire application from starting (by crashing up the supervision tree) and I suspect that it will crash the supervision tree as well if the connection is dropped for an extended time (say I’m doing maintenance on that node).

Is there a way to handle slow, scheduled retries at the supervisor level, or should I move the connection attempts outside of init/1, or is there some other best practice that others would recommend to handle connection retries?

1 Like

If the possibility that the connection fails is something that your system has to handle during normal operations, I’d move the connection logic outside of init/1, and make the GenServer logic explicitly deal with the “unconnected” state (for example by trying periodically to re-establish the connection, and responding with errors to calls happening while the connection is down).

4 Likes

Like @lucaong said, it’s better to handle connecting outside your init/1 function. You could try doing that with :continue/handle_continue that will not block the supervision tree from starting up (a good strategy when initialization of a server is expected to be slow) or you could use the Connection https://github.com/fishcakez/connection lib that makes connection and reconnection handling logic really explicit

2 Likes

sorry, I misspoke, I didn’t mean slow, I meant infrequent. Putting it in init/1 isn’t a problem in the ‘blocking-the-caller’ sense.

PS. that library is fantastic, thanks! I might incorporate it.