alexandru-calinoiu
How to retry failed Event Handlers in Commanded
I have a production app that uses commanded, the event handlers are started using a supervisor with the default retry values (max restarts 3, max seconds 5).
The problem is that if one of the command event handlers restarts more than 3 times in 5 seconds it will take the entire app down with it because it will kill all the other event handlers and then all the way up to the web server.
What are my options to solve this issue?
Most Liked
slashdotdash
You should implement the error/3 callback function in event handlers to handle problematic events so that the event handler does not restart on error.
The example error handling in the docs shows a simple strategy to retry X times and then log and skip an event that cannot be handled.
xpg
We ended up solving this by setting up a supervisor that allows our event handlers to crash, without restarting them. At runtime, we can query the supervisor about which event handlers / projectors are stopped and not restarted.
Our operations team is notified of this, so that they can investigate the situation.
The reason we did not go with skipping events, is that we’re afraid of continuing handling events, if there is one we cannot handle.
The only time we have needed this, it worked like a charm: We had an event being introduced by accident, that was unhandled in a projector. The projector got stuck and died. The rest of the system kept running, but operations were aware of the issue. We could then handle the situation (which in this case was done by blanking out the event), and restart the projector.








