Hello! This is my first post so sorry if I break some community guidelines, if I do so, it is unknowingly. This could be more of a discussions topic.
This is a post / general-question about pubsub and people’s experiences with it.
I’ve been evaluating the Phoenix.Pubsub module and trying to understand how it (and mostly :pg2) works. Mostly, I want to understand its performance characteristics. Reading a bit of source code shows me that every time that you “subscribe” (add a proc to the process group) it does a global transaction around the name of the group. I believe that this has to contact all nodes and acquire some type of dist. lock. This seems like it would be tough to scale, especially if your nodes are quite far apart. I’m also not 100% sure what happens with this module in the event of net splits. Perhaps it’s documented somewhere, but I haven’t been able to find it.
Imagine a scenario in which a client comes online and subscribes to say 10 topics. Does this contact all nodes 10 times (times however many messages are actually sent to perform the lock)? That feels like it would be a huge bottleneck.
It seems to me that an eventually consistent solution such as https://github.com/bitwalker/swarm (I’ve never used that either) would have massively better performance characteristics. Obviously there’s a small chance that you would not end up sending the message to all subscribers after the subscription is created, in the event that the subscription event doesn’t get distributed to all nodes immediately. This also doesn’t fix the net split problem on it’s own because you could create a subscription on one machine and publish to only a subset of the real subscriptions during the split.
Have people load tested pubsub on subscription creations/deletions and proven it is production worthy, if so where can I see that test? Or, are there examples of metrics from a production system somewhere?
What happens generally to :pg2 and pubsub during a net split?
What is the main reasoning behind using :pg2 in the first place? (stdlib?)
Does anyone have any experience with alternatives?
Sorry that’s a super long question, but I’m very excited to see if anyone else has had the same questions/concerns that I did! Thanks for any input!
We don’t do this, which you are right would be a huge bottleneck
Each node only has 1…N PubSub shards, where N is no more than a handful, but we default to 1. Only these shards join the global pg2 group, and they exist to broker broadcasts across the cluster, and relay those messages to their local node subscribers. So the only pg2 action that happens is when these processes start up and join the group, or shutdown and leave, which is infrequent. The real hot-code paths happen at broadcast level, but this is all message sending. When local process A broadcasts a message on Node A, the following happens;
the broadcast is sent via send to local-node subscribers on that topic, which is pull from ets
a single message is sent to the pg2 group telling the remote nodes to replay this broadcast to local subscribers
the message is picked up by the remote nodes and then send is called for every local subscriber, again pulled from ets
Yes, we did this with our 2M channel client load tests. The arrival rate was 10k connections/second, which means we achieved 10k subscriptions/second on the box, which included all the HTTP code in the mix, in addition to the pubsub code paths.
Messages broadcast during the split won’t be delivered to the remote nodes, but otherwise everything remains available. Phoenix.PubSub has no durable pubsub adapter today.
You guessed it It’s built in, and critically, the small load we place on pg2 makes any high-load performance a non-issue. Again only a single process (or a few) join the pg2 group per node.
Thanks for the quick thorough reply, and from the creator of Phoenix no less!
Sounds like I made a few assumptions about Phoenix.Pubsub and you know what they say about that.
Correct me if I’m wrong: subs are stored locally only, and messages that are broadcasted via pubsub are always sent to all nodes, correct?
If this is the case, that seems like you’d move the bottleneck (or at least more load) to the broadcast method, no?
The bench is cool and quite promising. However, it takes place all on one node, so it doesn’t test the distribution of subscriptions. It would be super cool to see a bench that tests distributed subs. (Perhaps I can help) The link I supplied shows a very steep drop off of subs/sec the more nodes you add for most of the tests he did.
Sorry if what I’m saying sounds overly negative, I’m super impressed with what’s been accomplished here and think that this is great for most use cases. I just like the concept of totally maxing out the scalability and have a feeling many people in this community feel the same way
I wonder if there could be a pluggable eventual-consistency based pubsub backend. I wouldn’t be opposed to working on that myself, if people think that could be a good idea.
Correct, unless you use direct_broadcast to send to a single node, all nodes receive it.
The bottleneck on the broadcast side would be message sending, via send, which is highly optimized. I don’t have specific #'s on large hardware, but my MacBook can crank out 500,000 messages/s thru the pubsub system.
To be clear, subscriptions in Phoenix.PubSub themselves are always local. So any load test of subscriptions will always be relevant for a single node only. The broadcast side is maybe what you meant, but I want to be clear there is no concept of a “distributed subscription” as far as Phoenix pubsub is concerned.
At a glance, the link you referenced uses pg2 for all subscriptions, making them a “distributed subscription”, but as we see, that’s not what you want in a pubsub system. A distributed load-test of Phoenix PubSub would be neat to see what kind of cluster throughput we can achieve on the broadcast side, but the subscribe side will remain the same for 1 node test or 100 node test.
True, your points about the load tests make sense.
I do think that, while any tests are nice to see, they need to be a little more real world to be truly valuable. In order to prove the actual throughput of the system, it should run clustered, then each client should create a bunch of subscriptions and also send/receive a few messages. That would be closer to a real world use case of the framework. (Well the streaming parts at least)
It’s hard to know what 500,000 msgs/sec on a laptop will actually translate to on for instance, cloud infrastructure. It’s hard to compare sending messages on a loopback device to really sending them over the network. You can stream 100’s of GB’s/sec on loopback. My point being that broadcasting to all nodes on the cluster could pose a much larger problem when some of your nodes might be far away. Furthermore, if you broadcast to all nodes in the cluster on every message send, you can effectively divide that number by the number of nodes that you have in terms of throughput of the pubsub system.
Other pubsub systems that I’ve seen broadcast the subscriptions (potentially with gossip) to all nodes, then use crdt’s to ensure they’re sync’d at intervals. This way you always have all of the subscriptions stored locally, at least within the sync interval. The benefit being that you can route messages to only the interested nodes.
It’s also nice for the node that receives the message to know immediately who needs to receive the message and who doesn’t need the message immediately. This would be great for instance for sending push notifications to users who aren’t currently connected, though perhaps you could use presence for that.
The current architecture does seem a bit easier to understand and perhaps more reliable, but I guess I’m just wondering if you ever considered the eventual consistency route.
I’m also highly interested in this. I’m just getting started with elixir development but have a great dream in developing something like Apache Pulsar or one of the big Paid for PubSub platforms in Elixir.