How to handling Database Issues in a GenServer and the Impact of Max Restarts in Elixir Supervisor

aungmyooo2k17 · May 26, 2023, 8:04am

I have an application that displays messages on a website. These messages are shown after performing some calculations using my service. My main question is related to the behavior inside the GenServer. In the GenServer, I make a query to retrieve messages with certain conditions from the database. However, if there is a problem with the database connection or if the database goes down, the message query function may raise an exception.

To handle such situations, I have set the max_restart option to a very high value (e.g., 100000000) in the supervisor configuration. This means that even if the GenServer restarts multiple times due to errors, the supervisor will continue restarting it. This approach helps me handle database connection issues or outages.

However, I would like to know the impact of setting such a high value for max_restart in the supervisor on system performance. Does it affect the system’s overall performance? Additionally, I’m curious to understand what happens to the previous state’s data when the supervisor restarts the GenServer. Furthermore, I have limited knowledge about the inner workings of the Elixir garbage collection process and the Erlang virtual machine (VM).

Please help me clarify these questions as I do not have much experience in low-level system details or extensive knowledge about the Erlang VM.

josevalim · May 26, 2023, 8:07am

There is no performance impact whatsoever. The worst that it can happen is that, if you have a bug in your database connection code that causes it to crash, you can restart, trigger the bug again, crash, restart, trigger the bug, crash, forever. It won’t be different than doing a try/catch and trying again in another language.

When a process dies, including a GenServer, all state is lost unless you stored it elsewhere.

al2o3cr · May 26, 2023, 1:04pm

Preface: the following is general advice and I obviously haven’t seen any of your specific code. Mostly leaving this for other new devs that get here via search

One common “bad pattern” new users of GenServer can fall into is the “calculation server” - an example is featured in the GenServer docs, “When (not) to use a GenServer”.

One symptom of this pattern is a GenServer that doesn’t use or care about its state, for instance one that restarts over and over again

This pattern isn’t always incorrect, since sometimes the side-effects like “only one process per node can do this thing at a time” are desirable features. But usually it means a process is being used where a module would suffice.

aungmyooo2k17 · June 6, 2023, 3:37am

@json @al2o3cr
After a massive number of retry attempts, we encountered performance issues. To address this problem, I would like to introduce a delay of at least 10 seconds before attempting the next restart.

The unexpected error occurs in a GenServer process, which is handled by a DynamicSupervisor. The DynamicSupervisor, in turn, is handled by a Supervisor.

After conducting further research, I was unable to find any built-in options to set a wait time in the DynamicSupervisor and GenServer modules. However, I am still determined to find a solution and am exploring the possibility of implementing a hack to introduce a delay. I am also seeking assistance and guidance to achieve this goal.

aungmyooo2k17 · June 6, 2023, 7:49am

Finally I found this hack from this post

SirWerto · June 6, 2023, 11:47am

Hello

Consider to implement a retry mechanism on your GenServer instead of in the Supervisor. The Supervisor was thought to deal with an unexpected situation that causes your processes to die.

In this case, you are using the Supervisor features as a retry mechanism which is causing you a lot of extra troubles and workarounds.

@ferd has a great example of this but I haven’t been able to find it for the moment