blquinn

blquinn

Phoenix.Pubsub subscriptions performance

Hello! This is my first post so sorry if I break some community guidelines, if I do so, it is unknowingly. This could be more of a discussions topic.

This is a post / general-question about pubsub and people’s experiences with it.

I’ve been evaluating the Phoenix.Pubsub module and trying to understand how it (and mostly :pg2) works. Mostly, I want to understand its performance characteristics. Reading a bit of source code shows me that every time that you “subscribe” (add a proc to the process group) it does a global transaction around the name of the group. I believe that this has to contact all nodes and acquire some type of dist. lock. This seems like it would be tough to scale, especially if your nodes are quite far apart. I’m also not 100% sure what happens with this module in the event of net splits. Perhaps it’s documented somewhere, but I haven’t been able to find it.

Imagine a scenario in which a client comes online and subscribes to say 10 topics. Does this contact all nodes 10 times (times however many messages are actually sent to perform the lock)? That feels like it would be a huge bottleneck.

I have done 0 load testing, or anything of that nature, so I have no evidence to prove that. For that I apologize. I did find a test of pg2 here http://www.ostinelli.net/an-evaluation-of-erlang-global-process-registries-meet-syn/ which affirms my concerns.

It seems to me that an eventually consistent solution such as https://github.com/bitwalker/swarm (I’ve never used that either) would have massively better performance characteristics. Obviously there’s a small chance that you would not end up sending the message to all subscribers after the subscription is created, in the event that the subscription event doesn’t get distributed to all nodes immediately. This also doesn’t fix the net split problem on it’s own because you could create a subscription on one machine and publish to only a subset of the real subscriptions during the split.

In summary:

  1. Have people load tested pubsub on subscription creations/deletions and proven it is production worthy, if so where can I see that test? Or, are there examples of metrics from a production system somewhere?
  2. What happens generally to :pg2 and pubsub during a net split?
  3. What is the main reasoning behind using :pg2 in the first place? (stdlib?)
  4. Does anyone have any experience with alternatives?

Sorry that’s a super long question, but I’m very excited to see if anyone else has had the same questions/concerns that I did! Thanks for any input!

Most Liked

chrismccord

chrismccord

Creator of Phoenix

We don’t do this, which you are right would be a huge bottleneck :slight_smile:

Each node only has 1…N PubSub shards, where N is no more than a handful, but we default to 1. Only these shards join the global pg2 group, and they exist to broker broadcasts across the cluster, and relay those messages to their local node subscribers. So the only pg2 action that happens is when these processes start up and join the group, or shutdown and leave, which is infrequent. The real hot-code paths happen at broadcast level, but this is all message sending. When local process A broadcasts a message on Node A, the following happens;

  • the broadcast is sent via send to local-node subscribers on that topic, which is pull from ets
  • a single message is sent to the pg2 group telling the remote nodes to replay this broadcast to local subscribers
  • the message is picked up by the remote nodes and then send is called for every local subscriber, again pulled from ets

Yes, we did this with our 2M channel client load tests. The arrival rate was 10k connections/second, which means we achieved 10k subscriptions/second on the box, which included all the HTTP code in the mix, in addition to the pubsub code paths.

Messages broadcast during the split won’t be delivered to the remote nodes, but otherwise everything remains available. Phoenix.PubSub has no durable pubsub adapter today.

You guessed it :slight_smile: It’s built in, and critically, the small load we place on pg2 makes any high-load performance a non-issue. Again only a single process (or a few) join the pg2 group per node.

Hope that helps!

14
Post #2
chrismccord

chrismccord

Creator of Phoenix

Correct, unless you use direct_broadcast to send to a single node, all nodes receive it.

The bottleneck on the broadcast side would be message sending, via send, which is highly optimized. I don’t have specific #'s on large hardware, but my MacBook can crank out 500,000 messages/s thru the pubsub system.

To be clear, subscriptions in Phoenix.PubSub themselves are always local. So any load test of subscriptions will always be relevant for a single node only. The broadcast side is maybe what you meant, but I want to be clear there is no concept of a “distributed subscription” as far as Phoenix pubsub is concerned.

At a glance, the link you referenced uses pg2 for all subscriptions, making them a “distributed subscription”, but as we see, that’s not what you want in a pubsub system. A distributed load-test of Phoenix PubSub would be neat to see what kind of cluster throughput we can achieve on the broadcast side, but the subscribe side will remain the same for 1 node test or 100 node test.

Where Next?

Popular in Questions Top

9mm
I am constructing a JSON object (map) and I need to conditionally set a field. I’m trying to write proper elixir-way code… and I’m at a l...
New
qwerescape
Is there a way to get the call stack or stack trace at any point in the code? Not from exceptions, but an expression that returns how the...
New
albydarned
Hello all! I am typing this post from my new MacBook Pro with the M1 chip. I’m loving it so far, and will probably use it as my daily dr...
New
earth10
Hi, I’m just starting to build a side-project with Elixir and Phoenix and doing some basic test with Elixir alone. What strikes me is th...
New
chrisalley
ExUnit now has describe blocks which is a welcome addition coming from RSpec. In the docs, it states that nested hierarchies of describe ...
New
lessless
I believe there are people here who are dealing with CSV files import on the daily basis, and since Excel is a really popular tool there ...
New
JeremM34
Hello, how can I check the Phoenix version ? Thanks !
New
vegabook
I'm brand new to Phoenix and I have stripped one of the demo applications to the bone. I just want to get an svg up on the screen. Here i...
New
ashish173
I am using Ecto timestamps with postgres, I can see the timestamps() use the :naive_dateime but for my use case I wanted to store the ti...
New
sergio_101
I am VERY much an elixir newbie. I have taken one elixir course and one phoenix course on Udemy. During that course, I saw the instructor...
New

Other popular topics Top

9mm
I am constructing a JSON object (map) and I need to conditionally set a field. I’m trying to write proper elixir-way code… and I’m at a l...
New
mcarvalho
What is the difference between System.get_env and Application.get_env? For example, what are best practices to use one versus another.
New
chrismccord
Phoenix 1.4.0 released Phoenix 1.4 is out! This release ships with exciting new features, most notably with HTTP2 support, improved deve...
688 30840 112
New
JorisKok
I have a server on AWS, and was running a load test using artillery. When looking at the Phoenix dashboard I see the Ports going to 100% ...
New
josevalim
Hi everyone, One of the features added to Elixir early on to help integration with Erlang code was the idea of overridable function defi...
New
aalberti333
As the title describes, I’m trying to run Enum.map() over a list of key/value pairs, where the value is a map. My data looks like this: ...
New
jerry
Good day to you all. I have been struggling to get a query involving like and ilike to work. Can anyone assist me on this, please? pro...
New
freewebwithme
Using vs code and installed ElixirLS: support and debugger. And I got an error popped up on start up says Failed to run ‘elixir’ comma...
New
nsuchy
Hi. I’ve noticed that Windows Powershell has it’s own IEX command and you cannot access Elixir’s IEX due to the conflict. This isn’t a cr...
New
jononomo
For some reason my phoenix channels are working for me in my local dev environment, but as soon as I deploy via Docker, I get a 403 error...
New

We're in Beta

About us Mission Statement