Making a library with Supervisor + GenServers: can't decide what's more convenient to use

dimitarvp · September 17, 2022, 4:16am

Hey everyone,
Can you help a guy out with a case of analysis paralysis?

Context

I developed a small library for a VM-wide pool of persistent workers – i.e. they don’t get started on demand; they get started together with your application and stay there until it shuts down. The idea is to have a global limiting of a resource in the app a la an Ecto.Repo with a connection pool.

It’s a relatively thin wrapper around Task.Supervisor.async_stream_nolink and its main value proposition is offering a function named each that accepts a list and a function. It then multiplexes execution of the function on each item on all available workers e.g. sending a list of 7 items to a pool of 3 workers will result in only 3 parallel executions of the functions inside the worker processes, at a time. This each function blocks until all items are processed.

I made it work and it works pretty well and I like it. The part I got analysis paralysis about is which usage pattern to utilize. I made two and I just couldn’t decide between both of them (although I do have a preference).

Option 1: dedicated module + `use MyLibrary, params: ...`

With this option I can do the following:

defmodule YourApp.YourWorkerPool do
  # The following injects code in the current module via `defmacro __using__(opts)`
  use MyLibrary, workers: 3, call_timeout: :infinity, shutdown_timeout: :infinity
end

then you put it in your app supervision tree:

  def start(_type, _args) do
    children = [
      YourApp.YourWorkerPool,
      # ...any other supervised processes...
    ]

    opts = [strategy: :one_for_one, name: BusyBee.Supervisor]
    Supervisor.start_link(children, opts)
  end

You then use it like so:

YourApp.YourWorkerPool.each(items, function)

This works fine. The part I dislike is the need to have a dedicated module for it.

Option 2: a tuple inside the app’s supervision tree without the `use` construct

E.g.:

  def start(_type, _args) do
    children = [
      {MyLibrary, name: YourApp.YourWorkerPool, workers: 3, shutdown_timeout: 30_000},
      # ...any other supervised processes...
    ]

    opts = [strategy: :one_for_one, name: BusyBee.Supervisor]
    Supervisor.start_link(children, opts)
  end

Which is then used like so:

MyLibrary.each(YourApp.YourWorkerPool, items, function)

This works fine as well.

Option 3: have both

I… really don’t want to. To me that seems like a classic case of bloat.

Questions

If you were writing a library, which usage pattern would you go for? I admit I am leaning towards Option 1 for the following reasons:

Making a minimal placeholder module to serve as an injection target of code + have it be a neatly separated place responsible for this functionality feels right. And having one more mini module in the app doesn’t feel like a big sacrifice.
It’s more intuitive – I think, I am not sure – to do MyModule.function than ExternalLibrary.function(MyModule).
And pollutes the application’s supervised children with less visual noise.

Still, if you have any arguments in either direction, I am very curious to hear them. I personally knew former colleagues who would cringe hard at having to do use MyLibrary, ... inject code that is basically copy-paste and they would insist they only want it in one place (yes, even if they only did the use pattern only 2 times in their app). Are they focusing on the wrong thing and am I trying too hard to cater to such people?

Thank you for your time.

mindok · September 17, 2022, 5:52am

Hi @dimitarvp - you would be in good company with either option - I’ve just looked through the app supervision trees of 3 or 4 apps and it appears reasonably evenly split - e.g. Ecto.Repo uses your option 1, whereas Phoenix.PubSub uses your option 2.

I prefer the clarity in the supervision tree of Option 1, but…

One advantage of Option 2 is less indirection - it’s easy to take a look into the entry points of your library and see what it is up to. Macro-based code can take a bit more effort to grok if there are issues, depending on what macro-fu moves you pull.

Also, Cachex is probably a good comparison given what it does. It uses Option 2. If your :name option is any atom (rather than a module), that could make your examples a bit cleaner, e.g.:

MyLibrary.each(:pool1, items, function)

instead of

MyLibrary.each(YourApp.YourWorkerPool, items, function)

Option 2 makes it a little easier to play with in iex:

  {:ok, pid} = MyLibrary.start_link(name: :test_pool, workers: 15)
  MyLibrary.each(:test_pool, 1..200, &{IO.inspect({self(), &1})})
  GenServer.stop(pid)

Another point… libraries that use use do typically add behaviour (e.g. Broadway) or complex configuration that has to “stay alive” (e.g. Ecto.Repo) in MyModule, whereas in your case, the vast majority of the application-specific complex behaviour is externalised into items and function and only introduced into your scope for the purpose of multiplexing the calls, so IMO there is little value in having MyModule

It’s a good question, and no easy answer!

Nicd · September 17, 2022, 6:14am

Option 2, 100%. The less macros there are in the Elixir ecosystem, the better. use makes it difficult to know what is going on with the module and makes compilation slower due to more compile dependencies. New users of Elixir already have it difficult enough with all the various libraries that have their macro tricks.

The latter is the “standard” way how things should work and it’s more readily understandable what is going on: clearly we are using lib MyLibrary to run something, instead of this magic module. If the user so wants, they can easily write

defmodule MyPool do
  def each(items, function) do
    MyLibrary.each(MyWorkerPool, items, function)
  end
end

which again stays clear as to what is going on vs. a use statement (who knows what code is generated? how can I ensure I don’t accidentally make a function that conflicts with a generated one?).

If you want clarity in the supervision tree, you can in a similar vein write a function and use it in the list: children = [..., MyPool.child_spec(), ...].

The point is, option 2 leaves these tools available for the dev but doesn’t force them. As a dev I appreciate when I can make my own choices. The documentation can then provide useful copy-paste examples for novices.

One additional thing, which may not appear so useful here but might be more useful in other libraries: option 2 allows for runtime selection of the pool to use, whereas option 1 doesn’t. If this was needed the dev would need to use the option 2 form anyway, which would be non-obvious to discover and would mean they have two different calling conventions used.

dimitarvp · September 17, 2022, 6:22am

Hm, excellent point about reducing app supervision tree visual bloat by using child_spec. I completely forgot about it and I’ll add it, thanks!

Plenty of good thoughts. The “don’t use macros unless you absolutely have to” resonates strongly with me.

mattbaker · September 17, 2022, 8:14am

I came here to say something similar! I like the idea that option 2 allows me to create an abstraction if I want to de-clutter or hide details but I don’t have to, whereas option 1 doesn’t really give me the choice.

I think I often favor libraries that present collections of composable things because I can choose the level of abstraction I want.

What a fun question to ponder, I liked that you asked this @dimitarvp, I recently found myself in a similar bit of analysis paralysis myself

sasajuric · September 17, 2022, 7:55pm

I’m striving to figure out the purpose. Are you using this to limit the amount of concurrent operations in your system?

Either way, my vote definitely goes for option 2, because it’s the simpler version that perfectly serves your purposes. The user can (but doesn’t have to!) encapsulate this into a separate module.

I find option one overly complex, and in general I cringe when the abstraction needlessly has to be used. The use form is very implicit and IMO can and should be avoided in most cases. A good example where use is fine would be a behaviour with default callback impls, which is not the case you have here

dimitarvp · September 18, 2022, 2:58am

Yes. I didn’t mention it will have a non-blocking version as well. And yep, it’s meant to prevent bursty memory usage since occasionally there are many new tasks to run (on a schedule). And the code is deployed on fairly small instances – think 2 vCPU and 256MB RAM. And there’s a lot of binary wrangling.

…Thinking of it, I probably should have just went with Oban and limit the load there, and call it a day.

Still, it was a very fun exercise that helped me cement various OTP knowledge pieces and I’ll open-source this micro library at one point. I know at least two companies that will be happy to use it.

sasajuric · September 18, 2022, 6:46am

For sync operations you could use jobs. Here’s an example of a rate limited queue. You basically need to replace the {:standard_rate, 1} option with something like {:counter, {:limit, n}}, and it becomes a concurrent queue (it allows at most n operations at the time). See docs for details.

With jobs, the enqueued operation is running in your process, not inside some worker process, which eliminates the need to copy the data to another process in sync scenarios. It also means that multiple unrelated jobs won’t run in the same process, which I think is a better approach to having a pool of long running workers.

For async, however, you’ll need to spawn a separate process (probably a task) and invoke :jobs.run from there.

josevalim · September 18, 2022, 6:58am

You may also be able to use GenStage in the front with the ConsumerSupervisor at the back.

Although I am wondering if we shouldn’t add this to DynamicSupervisor. We already support max_children. Making it buffer would be relatively straightforward.

dimitarvp · September 18, 2022, 7:06am

Guess I was turned off because it was Erlang and I am able to read it as quickly as Elixir. Thanks for the pointer, will study it!

Not sure I follow. It’s the entire idea of a job queue having N child processes chugging away at jobs, no? If we’re going to work in the same process then we might as well just use Enum.map. I am likely misunderstanding, apologies.

Unless there are suspicions of leftover big binaries that can’t be GC-ed then IMO persistent workers are OK. But then again, we always get caught by surprise by those non-GC-able binaries so I see your point.

There’s also opq btw. It uses throwaway workers.

Cont’d in a reply to Jose below.

dimitarvp · September 18, 2022, 7:12am

Ha, could you elaborate on how would this look, please? Very interested that it didn’t even occur to me!

Alternative scenario to consider (/cc @sasajuric). I had one workflow where I needed two layers of workers, not one:

Callers. They do GenServer.call addressing the pooled workers. This blocks so it’s best done in a separate process.
Actual workers. They basically receive a GenServer message containing function and an arg and run the function with the arg. But since we can have e.g. 10 items and 3 workers then the callers seemed like a useful construct since in that case we’d spawn 10 callers that will all wait on GenServer.call, each addressing 1 out of 3 possible PIDs.

Now have in mind that was no less than 5.5 years ago and I didn’t understood OTP that well back then so it was likely over-engineered. But my idea at the time was to write something of my own so I can fully grasp how OTP works – it served that purpose but it might be useless otherwise.

(And I’ll openly admit I am always intimidated by GenStage… That’s very likely a mistake.)

sasajuric · September 18, 2022, 7:39am

We have a process that wants to start the work, and the process that does the work. With jobs these two are actually the same. The queue process is a coordinator which accept requests and issues grants per some rule (e.g. max n at the same time, or max n per some unit of time). The client process (e.g. plug request handler) asks the queue for permission. After the permission is granted, the client proceeds, and after it’s done the client informs the queue. No other extra worker processes need to be started.

As a result, there might be more processes started in the system (such as plug/phoenix request handlers), but most of them will wait until the queue process gives them permission to proceed.

In addition, the latency of one job might be affected by the latency of previous unrelated jobs that were running in the same worker.

Sounds good to me. However, currently if max_children is exceeded, start_child returns an error. So not sure how the API would look like for buffering.

Either way, I did once or twice with Parent.GenServer which is basically supervisor as a GenServer. The basic idea was to start a child when requested, unless max children are running, in which case we buffer the request. Once a child stops, we take the next req from the buffer and start the new child.

As long as the worker processes don’t change the GenServer state on each request, this sounds like a scenario for jobs to me.

I find it’s a very complex behaviour, and it definitely wouldn’t be my first choice in this scenario.

dimitarvp · September 18, 2022, 7:47am

It doesn’t but I really can’t see the value of jobs as a parallel executor in this case. The requirement isn’t “spawn minimal amount of processes” (since we all know they are very cheap), it’s more of a “make sure we don’t hit this shared resource more than necessary” or “try to limit system resources footprint while doing something that does not hit a shared resource”.

If I read you correctly then it sounds like jobs is indeed only a mini-orchestrator and permission granter: “OK, you can fetch and execute your next task now”. But doing it serially, in the same process? Sorry that I keep not getting it.

My usage scenario is: I want truly parallel but not too parallel.

josevalim · September 18, 2022, 11:40am

We can have max_children and max_buffer. Buffer by default is zero. So you still get an error if you go over the buffer value.

sasajuric · September 18, 2022, 12:11pm

Yes, this is what jobs can give you. Let’s see a few examples.

Suppose that in phoenix action handler you need to communicate with external service, and you want at most 5 such outbound requests to run at the same time. To do that, you need to create your queue, e.g. in app start:

:jobs.add_queue(:my_queue, regulators: [counter: [limit: 5]])

Now in your phoenix action handler you can do:

:jobs.run(
  :my_queue,
  fn ->
    # communicate with remote service
  end
)

This will ensure that no matter how many requests are currently handled, at most 5 of them will communicate with the remote service, while the others are waiting until one of the current running :jobs.run invocation finishes (or crashes). The call to :jobs.run is blocking, and the provided lambda is running in the caller process. The result of :jobs.run will be the result of the provided lambda.

Now, let’s see another scenario. Suppose that we have a single process which fetches messages from some queue, on every message it needs to run some function, and we want to make sure that at most five of such functions are running. Here’s how we could do that:

Task.Supervisor.start_child(
  :some_supervisor,
  fn ->
    :jobs.run(
      :my_queue,
      fn ->
        # at most 5 of these things will run at the same time
      end
    )
  end
)

So in this case, we had to create a separate task per each computation, and we run the queued operation there. This is roughly similar to what you probably implemented with your abstraction, with some notable differences:

Each queued operation is running in a separate one-off process
The user of jobs is in charge of choosing the process structure. For example, they could use GenServer instead of Task if the operation is stateful.

Both of these are imo advantages of jobs over a fixed pool of long-living workers. Unless the worker has to keep some state (e.g. db connection), you don’t need such process structure. In fact, as the first example showed, you might not need to start any additional processes (other than the queue process itself, started with add_queue).

In summary, the approach taken by :jobs is probably the simplest one that fits the problem of running at most N things at the time. It can reduce the amount of data copying in some cases, and it is quite flexible, allowing the end user to choose their own process structure for queued operations. On top of that, jobs offer other kind of regulation strategies, such as rate limiting, or even limiting depending on the current cpu/memory usage.

That said, for the second example (a long-running process spawns job processes with bound concurrency), I’d probably try to use Parent if possible, because that would simplify the process structure, and reduce the number of processes in the system, because I wouldn’t need to start the task process until the slot in the queue is available.

sasajuric · September 18, 2022, 12:12pm

Ah, that’s nice, I like it

dimitarvp · September 18, 2022, 1:47pm

Oh. I finally get it. You meant that there might be N>5 invocations of :jobs.run in different processes but 5 will be unblocked at a time. I see! Not sure why I couldn’t default to that before you spelled it out for me but now I get it. Sorry for being slow here.

Yes, in spirit (not implementation) this is more or less what I hand-rolled these years ago: a user of the library invokes MyLibrary.each and it then spawns X caller processes that then GenServer.call other Y processes (where X >= Y), and those Y processes (workers) are persistent, whereas X processes (callers) are ephemeral and only spawned on demand.

I don’t disagree that it’s best to give the user the building blocks and they assemble their own solution. I know a lot of former colleagues who love those libraries exclusively and ignore / look down on all others.

But a balanced point of view is always needed: I also consulted for and contracted for companies that needed basically the same things so my job was basically copy-paste my own mini-libraries and tweak 1-2% of the code or its config.

This made me wish for some pre-baked usage patterns, you know?

With your generous clarifications, I am now convinced I’ll use :jobs any chance I get because its promises to auto-adjust for system load sound amazing. Thank you!