Need some advice for Mozart performance testing

I have just started performance testing for mozart (BPM platform) and I am seeing something I don’t understand and am hoping to get some advice.

I have a GenServer module named ProcessEngine. Instances of this module are spawned via a DynamicSupervisor.

Each ProcessEngine instance spawned is initialized with a data structure representing a defined business process. To clarify, a “business process” is not an Elixir process.

The ProcessEngine instances runs until the “business process” has ran out of work to do, that is, it has finished it’s intended function.

So, here is the issue that I am trying to understand.

If I spawn 1,000 GenServers, they finish execution in about 300,000 microseconds:

iex [09:24 :: 6] > :timer.tc(fn -> run_process_n_times(%{}, :process_with_single_service_task, 1000) end)
{287086, :ok}

If I spawn 10 times that number, i.e. 10,000, they finish execution in 26,602,559, or about 100 times longer than the execution of a 1000 instances.

iex [09:24 :: 8] > :timer.tc(fn -> run_process_n_times(%{}, :process_with_single_service_task, 10000) end)
{26602559, :ok}

So, executing 10 times more GenServer instances takes 100 times the time to complete. I had assumed that execution time would increase linearly with the number of GenServer instances.

If I run the observer, I do see scheduler utilization go to 100% for 2 out of 12 schedulers. It’s always scheduler 1 & 2 that to to 100%. Couple of questions:

Why don’t I see more schedulers become active?
Is the 100% for two schedulers indicative of a problem?

Finally, is there any advice on how to analyze this?

Are your GenServers CPU-bound?

Good question, but sorry. How do I determine this? I don’t see this info available in the observer app.

Can you show your “run n times” function?

Sure, this file:

https://github.com/CharlesIrvineKC/mozart/blob/main/lib/mozart/performance/demo.ex

the last function in the file.

If anyone cares to run these tests themselves, it wouldn’t require too much work:

Paste to iex:

:observer.start()
import Mozart.Performance.Demo
clear_and_load()
spawn(fn -> :timer.tc(fn -> run_process_n_times(%{}, :process_with_single_service_task, 10000) end) end)

Thanks!

One thing that doesn’t look right in the observer System panel is the “Run Queue”. It stays consistently at 2. I would think it would go higher.

I discovered that each GenServer was getting shutdown after about one half of a millisecond. I increased the alive time of the GenServer to several seconds. When I did this, all 12 of the schedulers became utilized. I don’t understand this completely, but it kind of makes sense. So, that was one issue.

The second issue is that processing time doesn’t increase linearly with the number of processes serviced. I am still trying to figure that out.

One thing that’s not helping concurrency: doing work in init means that start_link takes longer to return. Consider moving the code from ProcessEngine.init to a handle_continue callback.

Another thing that isn’t helping concurrency: init and execute both need to make calls to singleton processes (ProcessModelService and ProcessService)

With a nearly 100x increase based on 10x more input, I’d start by carefully looking at where data’s being collected in the code; all it takes is one List.append that’s called per-ProcessEngine to make things quadratic.

Some other random thoughts:

  • terminate doesn’t do any formatting on the reason argument, so if you call Process.exit(some_pid, :shutdown) (eg) you’ll get :shutdown as an argument - and the ProcessEngine target will crash!

  • harping on the same point: terminate is not guaranteed, there are lots of (admittedly uncommon) scenarios where it will not run before the process disappears. Consider checkpointing the state during intermediate steps, if “resuming” is important.

  • consider extracting type-specific code like this to a per-step (or per-type) “callback module”

2 Likes

@al2o3cr

Oh Wow. That is very helpful. I appreciate your effort. I’ll get back to you. Thanks

One thing that’s not helping concurrency: doing work in init means that start_link takes longer to return. Consider moving the code from ProcessEngine.init to a handle_continue callback.

I did that.

Another thing that isn’t helping concurrency: init and execute both need to make calls to singleton processes (ProcessModelService and ProcessService)

What you say is true, but why is making calls to those servers an issue? Perhaps you are alluding to it below, but I’m not sure.

With a nearly 100x increase based on 10x more input, I’d start by carefully looking at where data’s being collected in the code; all it takes is one List.append that’s called per-ProcessEngine to make things quadratic.

Some other random thoughts:

I had read that a terminate call isn’t guaranteed. Pity, I thought. Wonder why that is? I’ll need to research what might cause the call to be skipped.

If I did checkpointing it would probably be done frequently and it might be a big hit to performance. Maybe an occasional missed terminate call might be worthwhile.

  • consider extracting type-specific code like this to a per-step (or per-type) “callback module”

That sounds like a sensible thing to to and a good exercise also.

Really appreciate your feedback.

Because every OTP process processes messages one by one. If you have 100+ processes each wanting something from that “central” process then that’s an obvious and very major bottleneck.

Yeah, good point. Actually, 1000 processes completed very quickly. 5000 wasn’t bad but performance degraded rapidly from there. I’ll be looking at refactoring some. I wonder how I can determine whether performance is degraded due to a bottleneck?

Thanks

Now that I read this closer, I realize that I don’t know what you are suggesting. Can you elaborate? Thanks

  • consider extracting type-specific code like this to a per-step (or per-type) “callback module”

I think he means create ServiceTask, JoinTask etc. modules that have functions like create_new and complete_task.

Consider using case instead of a series of if statements or a cond (see execute_process) that checks the same key a dozen times.

You’re discarding the state on line 282 there aren’t you?

Instead of checking type everywhere, you’d define a single map:

@callback_modules %{
  decision: DecisionCallbacks,
  service: ServiceCallbacks,
  send: SendCallbacks,
  ...
}

A method like complete_able simplifies to:

defp complete_able(t) do
  @callback_modules[t.type].complete_able(t)
end

DecisionCallbacks then defines:

defmodule DecisionCallbacks do
  def complete_able(_task), do: true

  ...more functions used elsewhere in ProcessEngine...
end

This would also be a valuable place to utilize a behaviour to ensure that the callback modules implement a consistent interface.


A further, even more dynamic approach, would have a field on Task that contains the callback module name. That would allow tasks with the same type to have different callbacks. Whether that’s a bug or a feature will depend on your specific needs :stuck_out_tongue:


One general note about both variants: they make the code easier to read, but also obscure it from parts of Elixir’s static analysis.

For instance, you would get a compiler error if you wrote DecisionCallbacks.complete_able() (calling with the wrong arity) explicitly, but writing @callback_modules[t.type].complete_able() will only crash at RUNTIME.

1 Like