blquinn

blquinn

Phoenix.Pubsub subscriptions performance

Hello! This is my first post so sorry if I break some community guidelines, if I do so, it is unknowingly. This could be more of a discussions topic.

This is a post / general-question about pubsub and people’s experiences with it.

I’ve been evaluating the Phoenix.Pubsub module and trying to understand how it (and mostly :pg2) works. Mostly, I want to understand its performance characteristics. Reading a bit of source code shows me that every time that you “subscribe” (add a proc to the process group) it does a global transaction around the name of the group. I believe that this has to contact all nodes and acquire some type of dist. lock. This seems like it would be tough to scale, especially if your nodes are quite far apart. I’m also not 100% sure what happens with this module in the event of net splits. Perhaps it’s documented somewhere, but I haven’t been able to find it.

Imagine a scenario in which a client comes online and subscribes to say 10 topics. Does this contact all nodes 10 times (times however many messages are actually sent to perform the lock)? That feels like it would be a huge bottleneck.

I have done 0 load testing, or anything of that nature, so I have no evidence to prove that. For that I apologize. I did find a test of pg2 here http://www.ostinelli.net/an-evaluation-of-erlang-global-process-registries-meet-syn/ which affirms my concerns.

It seems to me that an eventually consistent solution such as https://github.com/bitwalker/swarm (I’ve never used that either) would have massively better performance characteristics. Obviously there’s a small chance that you would not end up sending the message to all subscribers after the subscription is created, in the event that the subscription event doesn’t get distributed to all nodes immediately. This also doesn’t fix the net split problem on it’s own because you could create a subscription on one machine and publish to only a subset of the real subscriptions during the split.

In summary:

  1. Have people load tested pubsub on subscription creations/deletions and proven it is production worthy, if so where can I see that test? Or, are there examples of metrics from a production system somewhere?
  2. What happens generally to :pg2 and pubsub during a net split?
  3. What is the main reasoning behind using :pg2 in the first place? (stdlib?)
  4. Does anyone have any experience with alternatives?

Sorry that’s a super long question, but I’m very excited to see if anyone else has had the same questions/concerns that I did! Thanks for any input!

Most Liked

chrismccord

chrismccord

Creator of Phoenix

We don’t do this, which you are right would be a huge bottleneck :slight_smile:

Each node only has 1…N PubSub shards, where N is no more than a handful, but we default to 1. Only these shards join the global pg2 group, and they exist to broker broadcasts across the cluster, and relay those messages to their local node subscribers. So the only pg2 action that happens is when these processes start up and join the group, or shutdown and leave, which is infrequent. The real hot-code paths happen at broadcast level, but this is all message sending. When local process A broadcasts a message on Node A, the following happens;

  • the broadcast is sent via send to local-node subscribers on that topic, which is pull from ets
  • a single message is sent to the pg2 group telling the remote nodes to replay this broadcast to local subscribers
  • the message is picked up by the remote nodes and then send is called for every local subscriber, again pulled from ets

Yes, we did this with our 2M channel client load tests. The arrival rate was 10k connections/second, which means we achieved 10k subscriptions/second on the box, which included all the HTTP code in the mix, in addition to the pubsub code paths.

Messages broadcast during the split won’t be delivered to the remote nodes, but otherwise everything remains available. Phoenix.PubSub has no durable pubsub adapter today.

You guessed it :slight_smile: It’s built in, and critically, the small load we place on pg2 makes any high-load performance a non-issue. Again only a single process (or a few) join the pg2 group per node.

Hope that helps!

14
Post #2
chrismccord

chrismccord

Creator of Phoenix

Correct, unless you use direct_broadcast to send to a single node, all nodes receive it.

The bottleneck on the broadcast side would be message sending, via send, which is highly optimized. I don’t have specific #'s on large hardware, but my MacBook can crank out 500,000 messages/s thru the pubsub system.

To be clear, subscriptions in Phoenix.PubSub themselves are always local. So any load test of subscriptions will always be relevant for a single node only. The broadcast side is maybe what you meant, but I want to be clear there is no concept of a “distributed subscription” as far as Phoenix pubsub is concerned.

At a glance, the link you referenced uses pg2 for all subscriptions, making them a “distributed subscription”, but as we see, that’s not what you want in a pubsub system. A distributed load-test of Phoenix PubSub would be neat to see what kind of cluster throughput we can achieve on the broadcast side, but the subscribe side will remain the same for 1 node test or 100 node test.

Where Next?

Popular in Questions Top

Harrisonl
We have an ECS cluster with 4 services, where each task joins a single cluster, via discovery ECS discovery service. Currently when I de...
New
qwerescape
Is there a way to get the call stack or stack trace at any point in the code? Not from exceptions, but an expression that returns how the...
New
skosch
To my knowledge, put_in, Map.update etc. all have the one limitation of not automatically creating intermediate keys when needed (for exa...
New
albydarned
Hello all! I am typing this post from my new MacBook Pro with the M1 chip. I’m loving it so far, and will probably use it as my daily dr...
New
lessless
I believe there are people here who are dealing with CSV files import on the daily basis, and since Excel is a really popular tool there ...
New
myronmarston
The Elixir Typespec docs show the following syntax for keyword lists in typespecs: # ... | [key: type] # keyword lis...
New
pmjoe
I have a relationship of love and hate with Elixir. Lots of things are just absolutely right, but there are some things that are kind of ...
New
fayddelight
I tried installing elixir 1.11.2 erlang 23.3.4 via asdf in my zsh shell. Enabled the versions locally and globally. When I list them ...
New
komlanvi
Hi everyone, I was playing with phoenix liveView but I run into an issue. I have a form and want to validate each input text when the te...
New
Brian
What is the proper way to load a module from a file in to IEX? In the python world, doing something like this pretty standard: from ....
New

Other popular topics Top

Harrisonl
We have an ECS cluster with 4 services, where each task joins a single cluster, via discovery ECS discovery service. Currently when I de...
New
greenz1
I have a phoenix application from which a user can download multiple(5-6) files of size 1MB. I couldn’t find anything related to sending ...
New
gshaw
What is the idiomatic way of matching for not nil in Elixir? E.g., First way: defp halt_if_not_signed_in(conn, signed_in_account) when...
New
joeerl
Hello again - after a longish gap I’ve decided I really must dig into Elixir and see what’s been happening here - so I have a few questio...
New
pmjoe
I have a relationship of love and hate with Elixir. Lots of things are just absolutely right, but there are some things that are kind of ...
New
vonH
When I run the Plug and I recompile I wind up having to use Ctrl C to quit iex and start again. Witht the help of rlwrap I can use the cu...
New
gausby
I asked this very same question on twitter and got some interesting feedback, but I thought it would be a good question to ask here as we...
1207 39247 209
New
AngeloChecked
What learn first? Rust or Elixir Hi Elixir community! I’m here because i want learn a new language. I’m a junior developer and mainly i ...
New
jay1
Why is it that the mnesia database isn’t the most preferred database for use in Elixir/Phoenix?
New
nobody
Hi! In PHP: $SERVER['SERVERADDR'] - in Elixir? Searched the docs for ip address and the web, no good results. Thanks!
New

We're in Beta

About us Mission Statement