cadebward

cadebward

Implementing a distributed users counter

Hey everyone!

I’m struggling to get Phoenix.Presence to scale up sufficiently for our needs. I feel like Presence might be overkill in this situation, but I’m not 100% sure what the alternatives would be.

Our Issue

The Presence.track is timing out around 80-100k connections. Our load tests are joining and leaving LOTS of times every second. That creates lots of traffic for the presence diff. Eventually, the whole cluster just stops tracking until it has time to catch up.

Some Things We Have Tried

  • Reimplemented Phoenix.Tracker and took out the broadcast on join/leave
  • Implemented a dirty count that reaches into Phoenix.Tracker.Shard.dirty_list and then does not group them, but just counts them, to reduce cycles during the count

We’re basically just doing presence so that we can get a valid count of the connected users. What are some alternatives to phoenix presence when you only need to track a count?

Most Liked Responses

josevalim

josevalim

Creator of Elixir

If all you want is the connected users, the presense is definitely overkill, because you are sharing each connected user and their metadata across nodes when all you need is a counter.

A simpler solution is to share only the counter. Here is a high-level outline, broken in two steps of one possible approach. I will be glad to clarify any possible points and answers questions.

Step 1: basic setup

Every time a user connects, you send a message to a process with your PID. We will call the receiveing process the “LocalCounter”. The LocalCounter will bump its internal counter when it receives said message and monitor the PID. Once it receives the DOWN message, it decreases the counter.

You will also have a separate process, which is the GlobalCounter. The GlobalCounter will receive updates from other processes in the cluster. You will:

  1. Every X seconds, you will query the local counter and broadcast a message on a “global_counter” topic with your node name and your local counter

  2. Other nodes will receive your message and they should store: the time they received the message, the node name, and the counter

  3. The total counter is the sum of local counter with all other global counters

  4. After you broadcast, you should prune any dead node. You can consider a dead node to be any node that you haven’t received a broadcast from after N*X seconds. Alternatively, you can use :erlang.monitor_nodes() to see when nodes go up and down so you can remove those entries immediately

The choice of X is important. X means how frequently you will broadcast, too small means a lot of traffic but always up to date. For example, if X is 5 seconds, it means that you will stay behind from other nodes at most 5 seconds. X is also the maximum time it takes for a new node to receive all updates when it goes up.

Step 2: optimizing

The implementation above has one issue: the LocalCounter will likely become a bottleneck. We can address this by using the :counters module in Erlang and changing it to be a pool of processes. Here is what we will do:

  1. Instead of a single local counter, we will start N local counters. We will also create a use the :counters API to create a counter array of N entries. Each local counter will have an index inside the counter array and update said index.

  2. Now, when you need to track a given PID, you should do :erlang.phash2(pid, N) to select one of the existing local counters. You can use a Registry to track the local counters.

  3. Change the global counter to, instead of asking the local counter its current count, to traverse all indexes in the :counters reference, adding them all. That’s what you will broadcast now.

In pseudo-code, your CounterSupervisor’s init will look like this:

@count 8

def init(_) do
  counter = :counters.new(@count, [:write_concurrency])

  children = [
    {Registry, name: CounterRegistry, kind: :unique},
    {GlobalCounter, counter}
  ] ++
    Enum.map(1..count, fn index ->
      Supervisor.child_spec({LocalRegistry, counter: counter, index: index}, id: index)
    end)

The LocalCounter should register itself under CounterRegistry with the index.

When dispatching to a local counter, you will roughly do this:

def count_me do
  # phash is zero-based
  index = :erlang.phash2(self(), @count) + 1
  name = {:via, Registry, {CounterRegistry, index}}
  GenServer.cast(name, {:count, self()})
29
Post #2
cadebward

cadebward

We were able to implement our own incantation of this and use it live with over 120k users live. We peaked at around 15% cpu when large batches of users were joining, and leveled out at around ~7% cpu.

I’ve prepared a gist of the main parts of the code for the benefit of future readers.

I still think there are improvements to be made here. One downside is if the node that has the global gen_server running crashes for some reason, we won’t start up a new one.

If anyone ends up using this code and making improvements, please report back!

15
Post #7
josevalim

josevalim

Creator of Elixir

It is worth adding that the topics approach has another issue: if you have a topic that no user is leaving and joining, i.e. it is constant, then you will broadcast the topic every X seconds, even though the data is the same. So if you have a long tail of topics, it means your payload may be really large, while the tracker version would be more optimized.

Therefore, there is another alternative here, which is to continue using Phoenix.Tracker, but you will track the local counter instead of each individual process. In a nutshell:

  1. Create N local counter processes. It has two keys in its state, the overall state and the diff

  2. Every time a local counter process receives a pid-topic pair to track or a DOWN message, you put that in the diff. If topic “foobar” receives two joins and one leave, the diff will be %{"foobar" => 1}

  3. Every X seconds, you will merge the diff into the overall state. If a new topic was added, you start tracking it from that local counter process. If a topic is removed, you start untracking. If the topic counter was updated, you update its tracking. The current counter will be the metadata.

This means Phoenix.Tracker becomes your “GlobalCounter”, which is quite efficient and optimized. In pseudo-code your counter supervisor would look like:

@count 8

def init(_) do
  children = [
    {Registry, name: CounterRegistry, kind: :unique},
    {Phoenix.Tracker, ...}
  ] ++
    Enum.map(1..count, fn index ->
      Supervisor.child_spec({LocalRegistry, counter: counter, index: index}, id: index)
    end)

The only last thing to consider is how you are going to hash the topic-pid pairs across local counters. You can:

  1. hash by pid, this means that if you have N local counter processes, they can all end-up tracking the same topic. This may increase the payload as the counters are split across local processes

  2. hash by topic, this decreases payload but it may increase contention if there is a topic with dozens of thousands of users

Finally, to get the number of users in a topic, you need to query the given topic and sum all of the members count metadata. If you are hashing by topic, the number of entries will be the number of nodes. If you are hashing by pid, it will be at most the number of nodes * N. In both cases, it should be fast enough to compute on demand. If it isn’t, you can use handle_diff to store the overall results directly on ETS.

Where Next?

Popular in Questions Top

Harrisonl
We have an ECS cluster with 4 services, where each task joins a single cluster, via discovery ECS discovery service. Currently when I de...
New
greenz1
I have a phoenix application from which a user can download multiple(5-6) files of size 1MB. I couldn’t find anything related to sending ...
New
Patoshizzle
After calling mix ecto.create I get this error: 17:00:32.162 [error] GenServer #PID<0.412.0> terminating ** (Postgrex.Error) FATAL...
New
fireproofsocks
I’m working on defining a simple Ecto schema for a table (in PostGres), but I don’t see where I can define a column as NOT NULL. Conside...
New
Fl4m3Ph03n1x
About me? ( if you have nothing better to do than reading about some random guy in the internet :stuck_out_tongue: ) Hello all, this is ...
New
jaysoifer
Is there a way to rollback a specific migration and only that one (“skipping” all the other ones)? Would mix ecto.rollback -v 200809061...
New
rms.mrcs
Hi, I need to transform a list of numbers into a map where the keys are the indexes and the values are the original values of the list. ...
New
shijith.k
I am trying to start a new phoenix project with elixir 1.9, but mix phx.new does not work. It says that ** (Mix) The task "phx.new" could...
New
dotdotdotPaul
Okay, I’m having a heck of a time trying to figure out how to best handle the validation of belongs_to associations in Ecto. I’m sure I’...
New
hariharasudhan94
I would like to know what is the best IDE for elixir development?
New

Other popular topics Top

sen
Hi All, I set a environment variables in dev.exs , like below code. when i start server, how can i set the ${enable} value? thanks. d...
New
ovidiubadita
Hey all, I discovered Elixir and I love it. I always wanted to learn a functional programming and I intended to go for Haskell, but afte...
New
johnnyicon
Hi all, I’ve just started learning Elixir and Phoenix Framework, so please pardon my n00bness at this stage. I’m trying to use Postgres...
New
Fl4m3Ph03n1x
About me? ( if you have nothing better to do than reading about some random guy in the internet :stuck_out_tongue: ) Hello all, this is ...
New
hariharasudhan94
lets say i have a sample like a = 20; b = 10; if (a > b) do {:ok, "a"} end if (a < b) do {:ok, b} end if (a == b) do {:ok, "equa...
New
joeerl
Hello again - after a longish gap I’ve decided I really must dig into Elixir and see what’s been happening here - so I have a few questio...
New
KronicDeth
Elixir plugin for JetBrain’s IntelliJ Platform (including Rubymine) This is a plugin that adds support for Elixir to JetBrains IntelliJ...
289 36128 110
New
Brian
What is the proper way to load a module from a file in to IEX? In the python world, doing something like this pretty standard: from ....
New
hariharasudhan94
I would like to know what is the best IDE for elixir development?
New
dogweather
I wrote this comment on r/haskell, and it’s not popular there. :wink: But I think I’m on to something… Haskell reminds me of Java, and e...
New

We're in Beta

About us Mission Statement