GenServer error recovery

tdejager · April 4, 2016, 12:41pm

Hi everybody!

Just a quick question for people that already have experience with OTP and GenServers and especially error recovery. I have a question regarding the recovery of GenServer’s but also erlang processes in an extent.

I know that one of the philosophies of erlang and elixir is the ‘let it crash’ philosophy. The erlang eco-system provides wonderful tools for monitoring and restarting processes. But I have run into some real-world questions regarding the design of the system.

Real world examples

For example, in a phoenix application I have a custom Mailer GenServer that uses an interface that sends an email when a user completes a specific action. This can be done with a cast to the GenServer, however say the mailer process crashes. How do I restart the GenServer, with the buffer of mails that still need to be send, would you guys persists these to a database or ETS?

Another example is a scheduler in the same application, that schedules participants in a tournament in a round-robin fashion, it does this in a POST request to the server. The scheduler saves the state to a MySQL database. What if this scheduling fails (e.g. a database error), would you consider doing a retry in the same request, or maybe retry the scheduling when the GenServer is restarted? Or just show an error message to the user.

What I am looking for is how to resume and robustify these processses. Anyone willing to share their insights?

Thanks!

NobbZ · April 4, 2016, 1:17pm

I have not done it myself so far, but one cause of crashing could be faulty state, so I would advice against doing it.

Anyway, one of the following SO threads might help you:

edit

Of course, both are about erlang, you might need to alter the proposed solutions a bit in a way that is more elixir-style.

rafadc · April 4, 2016, 2:37pm

I am far from being an Elixir expert. Anyway the first thing I can think of is to send one message to the genserver per mail you want to send. If the mailbox crashes I think the supervisor can spawn a new one and keep the current mailbox (the process’ mailbox I mean). You will miss the failed emails though.

Anyway I’d have to test this.

NobbZ · April 4, 2016, 3:57pm

According to erlang manual a processes mailbox is destroyed all together with its process.

rafadc · April 6, 2016, 2:10pm

You are right. Looks like the recommended way is either to have another process that only stores the messages and is not executing the dangerous operation that will be executed by another process or hold the information in a message queue or the like.

tdejager · April 6, 2016, 5:26pm

Thanks for the reactions. That sounds like a good idea! Have you found a reference that recommends this approach?

rafadc · April 7, 2016, 3:10pm

I read it in a mailing list for erlang but it makes sense:

http://erlang.org/pipermail/erlang-questions/2014-July/080270.html

To be honest I was trying to implement a demo but I had no time