How to handle failed server launch in a supervision tree?

I have a GenServer that is tied to a tcp connection; if the connection fails it fails init/1. This is a problem because if the other end of the tcp connection (which is a config-assigned ip address) is down it will prevent the entire application from starting (by crashing up the supervision tree) and I suspect that it will crash the supervision tree as well if the connection is dropped for an extended time (say I’m doing maintenance on that node).

Is there a way to handle slow, scheduled retries at the supervisor level, or should I move the connection attempts outside of init/1, or is there some other best practice that others would recommend to handle connection retries?

If the possibility that the connection fails is something that your system has to handle during normal operations, I’d move the connection logic outside of init/1, and make the GenServer logic explicitly deal with the “unconnected” state (for example by trying periodically to re-establish the connection, and responding with errors to calls happening while the connection is down).


Like @lucaong said, it’s better to handle connecting outside your init/1 function. You could try doing that with :continue/handle_continue that will not block the supervision tree from starting up (a good strategy when initialization of a server is expected to be slow) or you could use the Connection lib that makes connection and reconnection handling logic really explicit


sorry, I misspoke, I didn’t mean slow, I meant infrequent. Putting it in init/1 isn’t a problem in the ‘blocking-the-caller’ sense.

PS. that library is fantastic, thanks! I might incorporate it.