What would be best practices for settings up Genserver in this scenario

LostKobrakai · October 26, 2019, 3:25pm

This might be a valid strategy if you’re fine with ceasing any service when the dependency is down. For many services the db being down equals “we can’t do anything”. A cache being unable to be populated can be that, but at least as likely doesn’t need to be that.

For anything, which is not as critical as your primary db you’ll likely want to do exactly the opposite to what you suggested. Start your service, have means to detect/handle unavailability of the external service and still serve the parts, which do work, while other parts might not work.

There was a good talk concerning that by @keathley

blisscs · October 26, 2019, 3:55pm

I am a bit disagree with init callback should not be expensive.
I think in general case, it is good to make your init callback not expensive but not for call cases.

–
From the discussion above I do agree we should avoid any execution that could fail [not saying expensive here] the genserver in the init callback as per recommended by @LostKobrakai and you.

But if in some cases, we need to do some expensive calls during init just to avoid any race conditions (in theory there could be a race condition which allow message to be send to the process before another expensive initialization has been called on the server. Mentioned by @hamiltop on the discussion link I posted above.) on calls that could come after expensive initialization call on the genserver.

In that case I think we should do expensive initialization. No matter we do it in synchronously, or do it asynchronously(asynchronous initialization was mentioned by @michalmuskala in the discussion link I posted above).

Apologize for tagging people @sasajuric, @whatyouhide, @ericmj, @josevalim from that discussion Redirecting to Google Groups. Would like to hear their thoughts on this topic and to know what were outcome of that discussion on that post also.

sasajuric · October 26, 2019, 5:48pm

I only briefly skimmed this thread, so apologies if I repeat things which have already been mentioned.

When it comes to crashes, I think that potentially failing actions (e.g. accessing db or external service via network) should typically not be done in init/1. @ferd wrote a great post about it a while ago.

As for long-running init callbacks, I personally tend to avoid them. It’s worth keeping in mind that parent process (usually a supervisor) is blocked while the child’s init is running. If this is happening during the app start, the app boot time is prolonged. Even if the process is started after the app is started (e.g. the child is started under a dynamic supervisor), there can be some undesired consequences, because the parent supervisor can’t do other actions, such as start other children, or handle children crashes. In addition, while the new process is initializing, System.stop might take longer, and it’s more likely that some processes will have to be brutally killed during the polite system termination.

So in general, I try to avoid having long running process inits. I don’t like to speak in absolutes, so I’ll leave the room for some exceptional situations where long running init make sense, but I can’t recall the last time I deliberately reached for this approach

zacksiri · October 27, 2019, 4:26am

You may disagree. I just know from my personal experience and real world apps I built, what works and what doesn’t work. Ultimately it doesn’t really matter what I believe. It’s what is best to build a system that works and can handle failure gracefully. Which is my purpose for working with elixir based systems.

I also never mentioned that your call or cast cannot be expensive. I only mentioned init because you want to ensure that your process can start with least friction and do what it needs to do. When your process is alive it can manage itself to do tasks from call / cast / handle unexpected results / handle failures etc…

If your call is expensive your GenServer will block which is why by default GenServer calls timeout at 5000 ms. They timeout so that other process can make calls, so your system doesn’t end up in a deadlock.

Also one more thing about ‘race conditions’ the definition of a race condition is ‘undesired’ output.

Stopping something is not ‘undesired’. in fact Stopping / pausing something can be a good thing because it protects you from ‘undesired’ so as long as your system can pause and be smart enough to stop when it knows it can’t provide the correct output is a good thing.

‘Undesired’ is when you do 1+1 and you get 3 instead of 2 if your system returns {:error, :could_not_compute} that is a desired result. because it’s telling you it can’t finish something so you know there is a problem in your system which you can rectify VS if it returned the wrong ‘undesired’ result down the chain and your system ‘seems’ like its’ working where in fact it is producing ‘undesired’ consequences causing problems in the bottom lines. Because it ‘seems’ to be working correctly it’s extremely difficult to debug and fix. That is the problem with race conditions. I believe understanding the background for why race conditions happen and why we need to avoid them is also important in this discussion as it is mentioned quite a few times.

blisscs · October 27, 2019, 5:07am

Thank you @zacksiri @LostKobrakai @LostKobrakai @chrisza4, and @sasajuric

For pointing out the false believe in my thought, concerning Genserver should never receive a call without fully initialised(in here initialization cover every case, including expensive init, and other failable inits), to protect genserver from race conditions(for ex. call to fetch from the cache, without warmingup the cache (in here warming means fetching from unreliable source.)). And then put this genserver into its own new supervisor will resolve failing of genserver.

And the best practice for this should be, the genserver should reply with it’s not ready yet to process the request and ask the caller to retry again.

Eventhough, I still have some believe that there will be some use cases that, we have to do expensive initialization, in that case using some custom hack steps like patterns of post_init, which was mentioned in the link https://groups.google.com/forum/m/#!topic/elixir-lang-core/fLdVQDZcFo0 would be very useful.

very appreciated

sasajuric · October 27, 2019, 7:38am

You don’t need to use any hacks, since these scenarios are supported on Erlang/OTP 21+ via handle_continue.

From init/1 you can return {:ok, state, {:continue, some_term}}, which will ensure that handle_continue is the first thing invoked when the server enters the loop.