Help to understand PartitionSupervisor in elixir 1.14

shahryarjb · June 4, 2022, 6:19pm

Hi, I have read the PartitionSupervisor documents and I think I need help to understand it.

For example:

defmodule MishkaInstaller.Application do
    use Application
    @impl true
    def start(_type, _args) do
      plugin_runner_config = [
        strategy: :one_for_one,
        name: PluginStateOtpRunner
      ]
      children = [
        {DynamicSupervisor, plugin_runner_config},
      ]
      opts = [strategy: :one_for_one, name: MishkaInstaller.Supervisor]
      Supervisor.start_link(children, opts)
    end
  end

So I started it in a file named like plugin_state_dynamic_supervisor.ex this:

def start_job(args) do
    DynamicSupervisor.start_child(PluginStateOtpRunner, {MishkaInstaller.PluginState, args})
end

and finally, in my `Genserver` module:

  def start_link(args) do
    GenServer.start_link(__MODULE__, [], name: via(id, type))
  end
  
  defp via(id, value) do
    {:via, Registry, {PluginStateRegistry, id, value}}
  end

So every time I want to select a Genserver I am using a registry like this:

  def get_plugin_pid(module_name) do
    case Registry.lookup(PluginStateRegistry, module_name) do
      [] -> {:error, :get_plugin_pid}
      [{pid, _type}] -> {:ok, :get_plugin_pid, pid}
    end
  end

Now where I can use PartitionSupervisor to improve it? And the second question is with this ways If 1000 user call get_plugin_pid function and load a data from a specific state in a same time, I am going to have a bottleneck?

Thank you in advance

Genserver, and the Supervisor

msimonborg · June 5, 2022, 3:20am

In this case you would use PartitionSupervisor to partition your DynamicSupervisor and spread out the work load from one supervisor to many, with the default number of partitions being equal to the number of cores i.e. System.schedulers_online/0. This would reduce bottlenecks at your DynamicSupervisor for the work it’s doing, like starting, stopping, and restarting your plug-in processes. If you have a lot of concurrency with starting and stopping these processes then this may help you. The PartitionSupervisor would not prevent your get_plugin_id function from bottlenecking, since that is going through a Registry, not your DynamicSupervisor. To maximize concurrency and reduce bottlenecks with your Registry you can manually increase the number of partitions with the :partitions option, e.g. {Registry, keys: :unique, name: PluginStateRegistry, partitions: System.schedulers_online()}.

In my app there are many processes that may be started concurrently by many users. Each process does some clean up in the terminate/2 callback at shutdown. During deployments and scale-downs I need nodes to shutdown gracefully, and these processes must store their state to perform an eventual handoff to a new process that might start on another node and continue its work. I noticed in load testing that when the number of these processes under a single DynamicSupervisor reached a certain level, shutdowns took a really long time. When I tested out replacing the dynamic supervisor with PartitionSupervisor on my 8-core machine the shutdown time decreased dramatically, far more than 8x. This was so helpful for my use case that I copied the PartitionSupervisor module directly into my app so I can start using it without depending on pre-release 1.14.

If you have any particular questions about migrating to PartitionSupervisor I’d be happy to try and help.

shahryarjb · June 5, 2022, 5:08am

Thank you dear @msimonborg, but for my second question do you have any idea?

  def get(module: module_name) do
    case PSupervisor.get_plugin_pid(module_name) do
      {:ok, :get_plugin_pid, pid} -> GenServer.call(pid, {:pop, :module})
      {:error, :get_plugin_pid} -> {:error, :get, :not_found}
    end
  end

For example 10000 user in a specific time want to load a data from a specific state, what will happen!! Will I have a problem answering them? With the way now, I am using?

msimonborg · June 5, 2022, 5:16am

This was my try at answering your second question. If your get_plugin_pid function is using Registry then no part of the call is hitting your DynamicSupervisor, so optimizing with a PartitionSupervisor will not help. Optimizing Registry with more partitions might be helpful. Or, are you more concerned with this piece of code GenServer.call(pid, {:pop, :module})? Sorry if I’m misunderstanding your question.

shahryarjb · June 5, 2022, 5:51am

I am sorry it is my fault I tried to ask different questions in a same time.
Imagine we start 20 Genservers with 1 supervisor, as I know Genserver and Supervisor are single process and I want to figure out what will happen if 20k users call a specific Genserver state.

Registry can handle the 20k users request to pass PID to them?
If the 20k users have the PID of a specific Genserver, can call the state in a same time (in 1milisecound like GenServer.call(pid, {:pop, :module}))?
If the 20k users who have the PID of a specific Genserver want to edit the state of Genserver, what will happen?

For example:

I have a state in my ram which is loaded every time a user sends request to my server and get some information from this state, so I need it to scale it for many concurrent users, the users just load the state not edit it (more than 20k users in a same time).
Another part of my project, I have a Supervisor to let users create a state for themselves when they log in to my site. For example, when you log in first time from first platform like mobile, actually you are creating a state to store your token in my ram. And this state you created can be usable for 5 platforms in a same time. Mobile or desktop or even to see the site as web browser. Hence, the all 5 your platforms can edit some information in the state even in a same time.

So I want to be accessible for my users and scale for many concurrent users in different use case, so after solving the top problems fo each issue the, Where partitionsupervisor can help me to be better

Thank you in advance

msimonborg · June 5, 2022, 4:08pm

No problem!

If you have many users concurrently starting dynamic processes then the PartitionSupervisor may help reduce bottlenecks in that process. At least it won’t hurt : )

Registry is the right tool for this, it uses ETS tables which provide good read concurrency when being accessed my many processes. Consider partitioning your registry for better concurrency.

You may experience bottlenecking with this, you can run some basic load tests on your own to find out. If you have many processes concurrently accessing the same state then this could be a good use case for an ETS table with the read_concurrency: true option.

Also a good use case for wanting this state in an ETS table. I personally find it helpful to think of genserver state as something that the genserver uses to do its work, maybe a few processes will access it. If you plan to have many concurrent processes accessing shared state then it’s better to store it in ETS rather than a single process

If the state is only read and never updated, then it could be a good use case for persistent_term, which is optimized for reads at the expense of writes. Just be sure to read the performance tradeoffs in the module documentation very carefully.

It doesn’t seem like you’d have many issues with this if the state is only being read by 5 processes that all belong to the same user.

Also consider whether all or any of this state should be persisted into a Postgres database instead of in memory : ) Some of the complexity may disappear

Ultimately the PartitionSupervisor will probably only help you by reducing bottlenecks at your DynamicSupervisor, which seems like it could be a good fit for your app and it’s a very simple migration so it can’t hurt. I don’t think it will help with your other needs.

shahryarjb · June 5, 2022, 4:14pm

I am very grateful to you, and thank you for your time, I try to change my code to PartitionSupervisor after releasing elixir 1.14 Official

Help to understand PartitionSupervisor in elixir 1.14

For example:

and finally, in my Genserver module:

For example:

and finally, in my `Genserver` module: