PubSub Broadcast Performance and best practice

mortenlund · December 5, 2023, 12:29pm

Hi everyone!

I am currently making a system that gather data from subsystems and sends each “value” over websocket to a Elixir Phoenix application.
The application then wants to broadcast each of these values to my two other applications in the cluster and this is currently done using PubSub.

The reason that I am asking is that I anticipate alot of traffic here and would really like your input if this is a good/bad way of doing this.

The current payload size for each broadcast message is approx 1020 bytes, using value |> :erlang.term_to_binary() |> :erlang.byte_size() to find the size.
I currently have a way to get this down to about 520 bytes by using a smaller struct.

I send 10.000.000 messages per day at the moment but this will increase.
I send every message on one “common” topic called “values” and also a specific topic called "values:<node_id>

This means I have 20.000.000 messages sent from this node and they are all received on the two other nodes.
And if I understand correctly they are only sent once from node x to node y and then node y sends it to the subscribers on that node?

My other nodes subscribe to either the “common” topic or the node specific ones and they are used to visualize in LiveViews.

The “common” topic is also used to keep a “last value” Cachex store that the views use for the initial render.

I dont see any performance problem with my current load at the moment, but the amount of values will grow alot, at least 5-10 times more.

My questions boils down to:

Is PubSub a viable technology for my use case and something to grow with?
If yes, any better way to broadcast the messages except using a “common” and specific topic to be able to both get “all” values and also filter them where one only cares about node specific ones?
If no, what other technology could be viable? I was thinking Kafka or RabbitMQ or something similar, but that would require new applications/servers just to host them
Would it be better to store the “last value” on the node that receives the websocket messages and instead call this node to get them instead of storing them on each node?

Thank you for the great community!

RudManusachi · December 5, 2023, 6:01pm

Hi, @mortenlund!
I can’t give you definitive answer, but there are some things come to my mind to consider:

Phoenix.PubSub uses adapters to deliver messages, by default its :pg2. However, it could be Redis, Postgres, or even RabbitMQ or anything else, that can act as a message bus…

Now, it’s worth mentioning that default :pg2 uses distributed erlang… In general most of the available out of box tools in distributed erlang provide with eventual consistency. And they don’t provide with any “at least 1 delivery guarantee”.
If you need some guarantees - you would most likely need to opt in to 3rd party (RabbitMQ is a very strong candidate)

mortenlund · December 6, 2023, 7:02am

Thank you very much for your input @RudManusachi
I do not need consistency in this matter as I have mechanisms to make sure I only have the latest of the values in my cache, and the visualization would also not suffer if something was a bit delayed or if the same message was received more then once.

My main concern is actually that I see that I need to send a lot of unneeded data, and since the subscriptions happen on the local node, the broadcast will always go out of the main node even if there is no subscribers in the other nodes… And with my current payload size and load I easily get 3.5GB of transferred data from the main node in just 24 hours.

← Two days.

II will have this reduced when I move over to my new format, but it will still be alot og CPU and bandwidth on the nodes

adw632 · December 6, 2023, 7:35am

Yes, the fan out to clients is each nodes responsibility when using PubSub.

So if a topic has 1 million clients connected on node A, and another node B has 2 million clients subscribed on the same topic then a broadcast on a topic between node A or B will be just 1 message exchanged between node A and B whilst collectively 3 million plus 1 messages have been exchanged.

Another option to consider for reliability and scale is Oban using database for reliability and delivery guarantees. It provides an amazing set of features and performance for robust job processing, including job scheduling with duplicate/conflict detection with job replacement with newer arguments which may save you distributing out of date data.

Phoenix PubSub won’t give guaranteed delivery, it depends if you need that or not.

lud · December 6, 2023, 1:07pm

Can you add the origin node in the message, publish only to chmmon topic and filter locally ?

This will divide your bandwith by like 1.8 at least.

You can also use the compression options of term_to_binary/2

You could write the latest values to a database table and use Postgres pubsub to listen for changes, removing the need for Phoenix PubSub (and maybe Cachex). Or at least just publish a small value on Phoenix pubsub when the database has changed.

Also 10 million messages is a lot for a view. It is 115/s on average (or is it not ? I suck at math). Could you just send the last value every 1 second ?

RudManusachi · December 6, 2023, 9:04pm

would also not suffer if something was a bit delayed or if the same message was received more then once.

The concern is not a “delay” or “duplicates” but actually the loss of the messages.

with my current payload size and load I easily get 3.5GB of transferred data from the main node in just 24 hours.

why do you care about that number?

GC takes care of it
large binaries are optimized to not be copied in the same node between processes (I think >64kb on 64bit machine)

Now since you are dealing with LiveView, @lud gave a very valuable point about cutting down the number of messages sent to client.
That part actually can cause your server to get OOMKilled!

Watch this part of Marlus’s talk “Optimizing LiveView for Realtime Applications” in that last section he demonstrates how memory of Phoenix app can grow unbound if client can’t receive messages with the same rate they are published! (and overall the whole talk is a gem)

sb8244 · December 7, 2023, 7:00pm

I’ll address just this because the answer is “Yes”.

I built a pusher clone that was pushing payloads between 100 bytes and (too many) MB in size. We pushed easily > 300,000,000 messages per day. I don’t know the aggregated size of the messages. We had ~5 servers so each message was broadcast to all of the other servers.

I never saw performance problems from pushing too much data here. It was almost entirely hands off except for unrelated problems over time.

It’s important to note we pushed using built-in node-node connections. We did not use PubSubRedis connection.