Valid GenServer usage?

natewallis · November 13, 2018, 1:19am

Hi,

I have written a system that allows a business to enter basic details of a purchase. So far it is a simple CRUD system. Currently if the save operation works, then the user sees “Success”, otherwise they see “Failed” in a modal dialog. All pretty standard.

I have been asked to now sync data to a 3rd party service who takes care of warranty calls on the products the business sells.

Traditionally I would have probably written to the database and then made a call to the 3rd party to sync the data and used their result to determine whether data was synced to both database and 3rd party.

But I think I should probably approach this another way.

I was thinking:

Add a boolean column to the order table to determine if it has been synced to 3rd party via POST request
Create GenServer that keeps a lookout for orders that haven’t been synced
Over time, the GenServer would sync them to the 3rd party and update the boolean column to “true”…

I have played with GenServer when I first started learning Elixir, but having only written pretty flat web applications since, I haven’t really come across needing to use them.

My thinking of using an approach like this is that the front end user doesn’t care about whether the order has been sent to the 3rd party, all they care about is whether it has been saved to the database or not - that the CRUD operation has successfully completed.

So I don’t see any reason why the above would not work, but seeing as I haven’t visited GenServers in a little while, I guess my question is… am I on the right path with this or would there be a better approach in Elixir that would do this job. My memory is telling me that there is something a little more suitable than a GenServer which is more suitable… but can’t recall what that might be.

Cheers,
Nathan

gregvaughn · November 13, 2018, 5:44pm

While you could write a custom GenServer to do this, you have a problem that can be solved with higher level libraries. The custom GenServer would be educational for you as you work through the various corner cases, but for a real production application I’d recommend one of these two approaches:

use quantum (a cron-like scheduler) to check for the orders that need to be sync’d on some interval that makes sense for your app (every minute? every hour?)
use ecto_job for a persistent queue of records that mean “sync order id X” and a worker pool if quicker synchronizations are needed.

natewallis · November 13, 2018, 8:30pm

Thanks Greg,

Appreciate the response.

Yes, I implemented yesterday with GenServer and it was an educational excercise.

My use case is really quite simple. I currently work for a small company of 10 employees, we would be lucky to be putting through about 30-40 orders per day through the system. Trivial, I know…

Prior to this, every employee had their own spreadsheet with the same column names that was emailled to the boss once a week who copied and pasted into a “master” spreadsheet (which usually ended up with duplicates and data entry errors)

So at least we are out of excel land now…

Sounds like I just need to run quantum at the end of each day… I will check it out…

Looking at how simple my implementation was with GenServer though, I do wonder whether I need to bring another dependency into my project. Although I don’t see this project growing too big (famous last words?) as we are planning on moving to a fully fledged CRM next year.

tty · November 13, 2018, 8:35pm

It is never too late to change it when you need it. I would put it off until that time.

gregvaughn · November 14, 2018, 1:36am

Yes. I agree with @tty here. If you’ve got something working and your workload is that low (30-40 per day) then go ahead with what you’ve got. You can always refactor later if it makes sense.

natewallis · November 14, 2018, 1:47am

Thanks y’all… Yes, I am running with my tiny GenServer for now…

dimitarvp · November 14, 2018, 5:22pm

I did something similar about a year ago. What I ended up with was very bare bones GenServer which, upon processing a batch of records (100), would check if there are more – if true, it reschedules itself (via Process.send_after) after 10 seconds. And if there are currently no records to process, it reschedules itself to check again after 1 hour.

Offensively simple but worked perfectly – without a single mistake – for half a year before my contract ended.

tty · November 14, 2018, 8:19pm

There are situations whereby the Process.send_after isn’t actually needed. You could make use of the often forgotten timeout argument in the GenServer callback. Basically if the GenServer does not receive any message within timeout it would send itself a timeout atom for handle_info/2.

Process.send_after is more refined though as you can control the message sent.

axelson · November 14, 2018, 8:43pm

Also one thing to keep in mind with timeout is that you need to remember to always return the timeout from all of your genserver functions, otherwise no timeout will be scheduled (speaking a little loosely here)

sasajuric · November 14, 2018, 10:17pm

If a single process is used to both schedule and execute the job, then there are two possible issues.

First, an unhandled exception while running the job will take down the scheduler too. While the parent supervisor will handle the crash by restarting, if you have too many crashes in the short interval, this might have larger consequences on the system. In addition, crash will cause an execution offset (because after a restart, the scheduler starts at time zero).

In addition, if the job execution takes longer than the interval, then the next message will be processed later, and the execution will drift. If the job gets stuck forever, then the scheduling stops forever too.

Perhaps none of this is the problem in the current version, but every time the code is revisited and changed, the developer (which may not be the original author) has to be aware of the limitation and pitfalls of the current implementation.

This is why I prefer to run the job in a separate process from the scheduler. However, doing this introduces more complexity. Now we either need another supervisor for the job process, or we will run it as a direct child of the scheduler process.

In the former approach, the parent supervisor should be rest_for_one or one_for_all, to make sure that if the scheduler terminates, the currently running job is taken down too. In the latter approach, the scheduler needs to trap exits, and also needs to make sure that the child is properly terminated before the scheduler process dies. In both cases, the scheduler needs to keep track about whether the last started job is still running.

As you can see, there are some fine-print details here, which is why I wouldn’t suggest hand-rolling periodic execution for production code. I did it a few times, and every time there were some subtle issues discovered during the code review.

Most of the available libraries handle at least some of these concerns. Quantum is probably the most popular choice in Elixir ecosystem, and so we can assume that there are less chances of some subtle bugs happening with it. For a couple of reasons I wasn’t quite happy with Quantum, so I wrote an alternative called Periodic (a part of the parent library), which makes it IMO reasonably simple to run a periodic job. You basically need to provide {Periodic, run: mfa_or_zero_arity_lambda, every: interval} childspec under the desired supervisor and you’re good to go. We use this in our system in a couple of places, and it’s been working well for us so far.

natewallis · November 14, 2018, 10:32pm

Thanks…

@sasajuric funny you mention the job taking longer than the period for checking. I was thinking about this yesterday , but I didn’t think this would be an issue for me as I am making a synchronous call to our 3rd party API… so it will wait for a response before starting the timer again.

I think my code is so simple that it would never fall into this problem… would it?

generate_cooltrax_request() gets an unsynced record from the database, builds request body, POSTs it to 3rd party and handles response… currently just a basic {:ok} or {:error}

Here is my code so far:

defmodule InterimCrm.Data.CooltraxServer do

  use GenServer

  def start_link do
    GenServer.start_link(__MODULE__, %{})
  end

  @impl true
  def init(_) do
    check_orders()
    {:ok, []}
  end

  @impl true
  def handle_info(:find_unsynced_order_products, state) do

    order_product = InterimCrm.Data.get_unsynced_cooltrax_order_product!()

    case InterimCrm.Data.Cooltrax.generate_cooltrax_request(order_product) do
      {:ok} -> InterimCrm.Data.update_order_product(order_product, %{cooltrax_sync: true})
      {:error} -> nil #InterimCrm.Data.Cooltrax.generate_cooltrax_request can return {:error}  - more detailled errors to come
    end

    check_orders()
    {:noreply, state}

  end

  defp check_orders do
    Process.send_after(self(), :find_unsynced_order_products, 10000) 
  end

end

If I was processing larger volumes of data, I think a safer approach would be to wrap all of this in another Supervisor that spawned a dedicated CooltraxServer to handle processing one particular order… This way check_orders() could be called on a regular basis without waiting for a CooltraxServer to finish… ? But not sure I need this level of abstraction yet…

sasajuric · November 15, 2018, 8:40am

Never is a very bold claim To make such claim you have to understand the entire function invocation chain, including the code of all the dependencies, and you need to consider every possible input. Even if you can make such a claim today, it doesn’t necessarily holds in the future. If e.g. another developer modifies the code of InterimCrm.Data, the assumptions might not hold anymore.

Therefore, while your code probably doesn’t suffer from such problems today, I find it fragile with respect to any modifications.

Also, I wouldn’t say that this scenario is simple. Your example involves a database and a 3rd party server, so you deal with a couple of external dependencies which might disconnect, or become slow for the reasons which are outside of your control. Even the database driver library might have some subtle bugs. For example about a year ago we’ve experienced a case where postgrex ended up in an infinite loop.

In my view, regular ticking shouldn’t be affected by such problems. Separating the scheduler from the execution (i.e. running them in separate processes) will ensure that the scheduler always ticks normally, and so as soon as the circumstances allow it, some future job instance will succeed.

Therefore, I would always execute the job in a separate process. Using e.g. Periodic could make the implementation reasonably trivial. You basically need to include something like the following into the list of children of some supervisor:

{Periodic,
  run: &InterimCrm.Data.find_unsynced_order_products/0,
  every: :timer.seconds(10),
  timeout: :timer.seconds(10),
  overlap?: false
}

For good measure I also configured a timeout to make sure that the job isn’t running indefinitely.

So compared to the hand-rolled solution, this approach has the following benefits:

less LOC
the intent is stated in a clearer way
better fault-tolerance (because latency/failure of the job doesn’t affect the scheduler)
timeout regulation
scheduling logic is tested

The only downside I can think of is that the interval in Periodic specifies the time between two consecutive starts. In contrast, the interval in your solution specifies the time between the completion of the previous job and the start of the next job. The latter is currently not supported by Periodic, but adding this feature should be simple enough.

To summarize, if you want to execute some function every X in a production code, I’d advise using an existing 3rd party solution. As the author, I’m obviously partial to Periodic, but Quantum is also a valid option. Whichever you choose, it will likely require less LOC, and will probably have better fault-tolerance compared to a hand-rolled solution.

There will occasionally be cases where such abstractions won’t be a good fit. For example, if a GenServer is managing some state which has to be periodically cleaned up, I’d hand-roll the solution e.g. with Process.send_after. As far as I can tell, your don’t have such scenario, so I’d just advise using a library.

natewallis · November 15, 2018, 9:01pm

Hey Sasa,

Thanks for the detailed response… yes, true, I have been bitten by the “never word” before… you would think I would have learnt!

Could you give me an example how it is fragile regarding modifications, modifying to use a 3rd party library would not be difficult considering I have separated the POST request building into dedicated functions - as you demonstrated with your Periodic example…

Right - I think this is what I was getting at by adding a supervisor around my 3rd party requests… but of course this has already been thought of and built in the 3rd party libraries you are talking about.

The only reason I have it setup like this is that I did not want overlapping requests being sent to the 3rd party API because I am only doing one order at a time. Just in case a request to the 3rd party took longer than the interval.

Thanks for your help - lots to think about as always! See… before I was using the word never and now I am using always!!

dimitarvp · November 15, 2018, 9:57pm

@sasajuric You are very right of course, I just shared an anecdotal tidbit because each of us sometimes is in a huge rush and needs a solution yesterday (the fact that we always end up regretting not giving such a task 1 more hour is an irony that doesn’t escape me).

Thinking of it, I’d settle either for your library or just use DynamicSupervisor + a few monitors.