Should I be concerned about Erlang's native global registry performance?

toast · February 23, 2023, 1:01pm

Hello everyone,

This is my first post here, so please excuse me if I have accidentally missed any guidelines.

In my current company, we have developed and released an Elixir application that collects user events and conditionally responds to them in realtime at scale. Such user events are piped through Phoenix sockets/channels and/or HTTP APIs. We have 2 node replicas that usually auto-scale to 6 during traffic times. Each node can hold up to 30k active socket connections, that belong to ~10k unique users.

Given that user events are piped from multiple processes across the cluster, and node stickiness is not guaranteed, I’m currently working on centralizing the event pipeline for each user through establishing one global process per user across all nodes, given that they have at least one active socket connection.

# On new socket connection
user_ref = {:global, "manager_#{user_id}"}

case GenServer.start(__MODULE__, state, name: user_ref, spawn_opt: [fullsweep_after: 10]) do
    {:ok, _pid} ->
      GenServer.call(user_ref, {:connection, socket, params}, @connection_timeout)

    {:error, {:already_started, _pid}} ->
      GenServer.call(user_ref, {:connection, socket, params}, @connection_timeout)

    {:error, reason} ->
      {:error, "Could not start user process with error: #{reason}"}
end

# On user event
case GenServer.whereis(user_ref) do
      nil ->
        {:error, "Received an event for a user that has no global process registered"}

      _pid ->
        GenServer.call(user_ref, {:event, context}, @event_timeout)
end

Additionally, each user may have background tasks running, which are also globally named after their session IDs and unregisters at end of work.

task_ref = {:global, "manager_#{user_id}_#{session_id}"}
Task.Supervisor.start_child(
:user_tasks_supervisor,
      fn ->
        :global.register_name(task_ref, self())
        # do work
        :global.unregister_name(task_ref)
     end,
     shutdown: @shut_down_interval,
     restart: :transient
)

For each unique user, only one global registration will execute. Subsequent registration attempts from extra connections for the user, will call whereis/1 which is fast and reliable considering no bottlenecks in registration.

I’m worried that global registers/unregisters might cause a bottleneck. Especially that we still do not have a proper connection draining mechanism on new releases yet (code changes); All connections on terminating pods are disconnected at once and new pods receives reconnections at bulk, hence why we avoid releasing during traffic times.

I have tried to find if someone has already benchmarked the global module and found conflicting results so far:

This post shows that Erlang’s global module seems to suffer in terms of registrations per second, as soon as you scale beyond 1 node. Numbers don’t look good at all.
This study [Figure.5][Figure.6] does conclude that throughput and latency is affected by global commands, but not significantly before 10 nodes.

I have used our end to end stress testing framework against my change and did not see any significant throughput bottlenecks against our production code on 2 nodes.

I was wondering if anyone has had an experience with Erlang’s global module for registering and looking up a large number of processes across a cluster, or could point me into a direction to verify this further.

Many thanks!

DianaOlympos · February 23, 2023, 1:56pm

I would advise posting at https://erlangforums.com/ and maybe ping Maxim Fedorov. Long story short, yes, there are big bottlenecks using the :global registry, due to the locking needed.

I will also not tell you that this is bad, but point out The dangers of the Single Global Process and recommend keeping it in mind while doing your architecture. It may be that this is a great option for you and the right choice. But there are heavy costs.

toast · February 23, 2023, 6:09pm

I would advise posting at https://erlangforums.com/ and maybe ping Maxim Fedorov.

Thanks for the tip, will surely do so.

I will also not tell you that this is bad, but point out The dangers of the Single Global Process and recommend keeping it in mind while doing your architecture.

We are trying to process incoming events sequentially for each user, hence the idea of using a single global process. As of OTP 25, global prevents overlapping partitions, so any disconnected nodes will have to form again once reconnected.

It seems to me that global is designed for higher consistency than availability due to the global registration mechanism (locking & atomicity) making it a great solution for fewer long running processes. It might not be a bottleneck considering our current scale, but I’ll spend more time on experimenting with other available options to find that sweet spot, with dynamic node membership in mind. Considering the use case, swarm, gproc, and syn all seem to be equally, if not more, concerning as global.

@ostinelli would appreciate your feedback here.

ostinelli · February 26, 2023, 4:26pm

I don’t know why your original intent is to pipe everything through a single process in the first place.

If you can handle eventual consistency, you can use syn. if you need consistency, you might consider a raft implementation instead of global (such as ra) for a better performance, however this is true for some definition of dynamic node membership.