Should I use a GenServer?

silviurosu · March 20, 2019, 3:57pm

Background:
I have a restaurant with multiple delivery rules (circle, polygons, address, etc). The list of rules can be large with many points so I would not want to load it from DB for every computation.
On this set of rules I need to verify that a selected address matches any of this delivery rules.

Current implementation:
I have a GenServer that holds the delivery rules in the state and each “delivers?” call is done inside this process and returns true/false. The state is refreshed from time to time by checking updated_at column in DB from time to time. It does not need to be up to date instantly. It works fine but neither I do not have too much calls in this process.

I started to realize that this can become a bottleneck. What if I get hundreds of calls to this process and the computation takes time.
I am trying to redesign this part to be safe in case I scale. I have some ideas and I do not know what is best:

leave the current process as is and do the computations inside. It will be so fast that will may not become a bottleneck
store the delivery rules in an ETS table and load them in a plain module to do the computation in the user process. Maybe have also a dedicated process that updates the ETS table if the rules have changed.
use a genserver to store the delivery rules. When I need to do a computation copy state from the genserver and do the computation inside the user process.
do the computations inside the genserver via Task.async/await but this is similar to nr 3 above.

What is the best path to choose for this usecase?

Fl4m3Ph03n1x · March 20, 2019, 4:41pm

Using an ETS table instead of GenSever is a well established pattern. However, in this case, I agree with the author of Elixir in Action, in which he defends that you should start with a GenServer and then only after you stress test and benchmark, consider the decision of moving to an ETS table.

So the question comes: which stress tests and benchmarks have you done? What is the throughput your GenServer can actually handle, and when will you hit that ceiling? Moving to an ETS approach is an act of optimization and optimization doesn’t exist without benchmarks

Setting this aside and assuming that this GenServer will eventually be a bottleneck (because you know your system better than me and you know you will hit a ceiling soon) then I would probably not move to an ETS, but instead consider using a pool, like PoolBoy which can manage thousands of workers with ease.

You would still hit a bottleneck (if you have too many requests for workers) but by that time you would need to have tens of thousands of requests per second (like we do). In that scenario, using a ETS table is the only solution - either that or go distributed.

rvirding · March 21, 2019, 12:08am

There is another issue with ETS tables and that is that they have no support for transactions, or at least very limited. So even if you use an ETS to store the data if you want to implement safe transactions then you need to access the table through a process.

Fl4m3Ph03n1x · March 21, 2019, 7:56am

This is news to me. You have several operations in the ets module that are atomic and isolated, like insert/2.

Isn’t this pretty much the same as a transaction?

NobbZ · March 21, 2019, 9:33am

A transaction are many distinct database operations that look like atomic from the outside and also are rollbackable as a whole.

With ETS you usually do not get this. Especially not the rollback or commit all or nothing.

Fl4m3Ph03n1x · March 21, 2019, 9:42am

Ahh, I get it. In this context if I do insert/2 twice, even though each operation is atomic, the whole set of 2 operations is not. And there is definitely no rollback.

If you want transactions though, couldn’t you just go for Mnesia ?
I have never used it so I don’t know how it performs, but I hear some old school erlangers have some love for it.

al2o3cr · March 21, 2019, 1:08pm

The list of rules can be large with many points so I would not want to load it from DB for every computation.

Since the rules are already in the database, what about moving the computation there? Could the “does this rule match?” computation be expressed in a reasonable number of SELECTs? Circles + polygons might require a DB with spatial support (PostGIS, for instance).

rvirding · March 22, 2019, 2:04pm

It depends on how far you want to go. A simple way would be to put your own process in front of the ETS table(s) which handles basic transactions one at a time. This may be enough for you. Mnesia provides much more and so is much more complex. It depends on what you need/want.

blatyo · March 22, 2019, 2:49pm

An alternative design that doesn’t rely on ETS might be to create a pool of rule evaluators that each have their own copy of the rules and then a rule refresher that queries for updated rules periodically and sends them to the rule evaluators. The drawback here, is the multiple copies of the rules.

silviurosu · March 28, 2019, 7:24am

This sounds like a good solution also. To have a single refresher and multiple workers that evaluate the address based on rules.
I thought more into this and as @Fl4m3Ph03n1x said I’ll run a set of benchmarks to see how may evaluations it can support per second and optimize only in case it can not handle the load.