Data compression in PubSub

I’m working on a project that heavily relies on updates coming through the PubSub module of Phoenix. The data size, sent without compression, is about 2MB, whereas compressed it does not exceed 200kB.

That’s a huge difference and we already use it for our REST API, but couldn’t find any pointers to gzip compression in PubSub.

Would be great to see if it exists right now, and alternatively what would be needed to get this feature done.

I don’t believe there is anyting stopping you from compressing manually before sending, and decompressing after receiving. The easiest way would be to use compressed_data = :erlang.term_to_binary(yourdata, compressed: 9) to compress and :erlang.binary_to_term(compressed_data) to decompress.

Otherwise, there also is the :zip module, but I believe it works with files rather than inline data, which will require more boilerplate involving creating and manipulating temporary files.

1 Like

There is also zlib [0] module which works with inline data.

[0] http://erlang.org/doc/man/zlib.html

1 Like

Well I guess I can do that, but I’d like to have a way to compress and send the right headers to the browser, so it does the decompression automatically (same way it does with regular HTTP + the right header).

Apparently if you are using a PubSub adapter you can just turn on compression at the endpoint to let cowboy handle everything:

config :my_app, MyApp.Endpoint,
  pubsub: [adapter: Phoenix.PubSub.PG2,
           pool_size: 1,
           name: MyApp.PubSub],
  http: [compress: true]

Phoenix.PubSub
Plug.Adapters.Cowboy
How do I enable compression on channels in Phoenix?

However architecturally speaking it might make more sense to keep the PubSub messages light and simply send notifications that new data is available (possibly including the URL) and leave the heavy lifting to the standard request/response infrastructure (where HTTP caching could be employed if necessary).

1 Like

I don’t see how just sending a notification and then fetching the data is better than directly sending all the information that the frontend needs. In terms of design it’s simple, it does not need to make another whole roundtrip, it does not need to hit the DB again, there is no need for additional caching, etc.

Actually, so far this design has been a blast for our application, and not even very frequently sending 2MB of data is affecting performance too much. Actually, sending the same data over HTTP uncompressed takes a lot longer, I guess because of TCP handshakes and roundtrips which you don’t have with websockets, once the connection is established.

I think “easy” would be the more accurate term.

So if you choose to compress the message payload here is how cowboy does it: gzip_response.

Websockets use TCP - so I guess you are talking about the “overhead” of the HTTP protocol.

No, it’s simple. Because it does not require calling additional endpoints, DBs, etc. It is orthogonal to the easiness.

Correct.

Thank you!

A nice thing about compressing all the data of a large pubsub message to a binary is that binaries can be shared across processes unlike everything else but atoms, so you can get a substantial speed boost as well if broadcasting to a LOT of processes. ^.^

Cowboy does seem to support permessage-deflate on sockets - but as Phoenix channels are meant to be transport agnostic, selective message compression probably wasn’t for the time being a high enough value feature to justify the effort.

It boggles my mind though that the browser web api doesn’t expose the built-in compression functionality.

Binary data over Phoenix sockets uses MessagePack and selectively compresses messages.

So here are the problems:

More or less, there’s a lot of things happening preventing compression from just happening in Phoenix channels.

2 Likes