Data compression in PubSub

matiso · May 19, 2017, 9:02am

I’m working on a project that heavily relies on updates coming through the PubSub module of Phoenix. The data size, sent without compression, is about 2MB, whereas compressed it does not exceed 200kB.

That’s a huge difference and we already use it for our REST API, but couldn’t find any pointers to gzip compression in PubSub.

Would be great to see if it exists right now, and alternatively what would be needed to get this feature done.

Qqwy · May 19, 2017, 10:26am

I don’t believe there is anyting stopping you from compressing manually before sending, and decompressing after receiving. The easiest way would be to use compressed_data = :erlang.term_to_binary(yourdata, compressed: 9) to compress and :erlang.binary_to_term(compressed_data) to decompress.

Otherwise, there also is the :zip module, but I believe it works with files rather than inline data, which will require more boilerplate involving creating and manipulating temporary files.

idi527 · May 19, 2017, 10:30am

There is also zlib [0] module which works with inline data.

[0] http://erlang.org/doc/man/zlib.html

matiso · May 19, 2017, 11:08am

Well I guess I can do that, but I’d like to have a way to compress and send the right headers to the browser, so it does the decompression automatically (same way it does with regular HTTP + the right header).

peerreynders · May 19, 2017, 11:43am

Apparently if you are using a PubSub adapter you can just turn on compression at the endpoint to let cowboy handle everything:

config :my_app, MyApp.Endpoint,
  pubsub: [adapter: Phoenix.PubSub.PG2,
           pool_size: 1,
           name: MyApp.PubSub],
  http: [compress: true]

Phoenix.PubSub
Plug.Adapters.Cowboy
How do I enable compression on channels in Phoenix?

However architecturally speaking it might make more sense to keep the PubSub messages light and simply send notifications that new data is available (possibly including the URL) and leave the heavy lifting to the standard request/response infrastructure (where HTTP caching could be employed if necessary).

matiso · May 19, 2017, 12:03pm

I don’t see how just sending a notification and then fetching the data is better than directly sending all the information that the frontend needs. In terms of design it’s simple, it does not need to make another whole roundtrip, it does not need to hit the DB again, there is no need for additional caching, etc.

Actually, so far this design has been a blast for our application, and not even very frequently sending 2MB of data is affecting performance too much. Actually, sending the same data over HTTP uncompressed takes a lot longer, I guess because of TCP handshakes and roundtrips which you don’t have with websockets, once the connection is established.

peerreynders · May 19, 2017, 2:07pm

I think “easy” would be the more accurate term.

So if you choose to compress the message payload here is how cowboy does it: gzip_response.

Websockets use TCP - so I guess you are talking about the “overhead” of the HTTP protocol.

matiso · May 19, 2017, 2:31pm

No, it’s simple. Because it does not require calling additional endpoints, DBs, etc. It is orthogonal to the easiness.

Correct.

Thank you!

OvermindDL1 · May 19, 2017, 3:20pm

A nice thing about compressing all the data of a large pubsub message to a binary is that binaries can be shared across processes unlike everything else but atoms, so you can get a substantial speed boost as well if broadcasting to a LOT of processes. ^.^

peerreynders · May 19, 2017, 3:52pm

Cowboy does seem to support permessage-deflate on sockets - but as Phoenix channels are meant to be transport agnostic, selective message compression probably wasn’t for the time being a high enough value feature to justify the effort.

It boggles my mind though that the browser web api doesn’t expose the built-in compression functionality.

Binary data over Phoenix sockets uses MessagePack and selectively compresses messages.

Azolo · May 22, 2017, 6:45pm

So here are the problems:

WebSocket compression is enabled as an extension.
WebSocket extensions are negotiated by the client.
Plug currently uses version 1.1 of cowboy.
Presently version 1.1 of cowboy only handles then x-webkit-deflate-frame extension.
That compression only happens at the last step before sending.
Valid UTF-8 has to be sent in Text WebSocket messages.
Phoenix serializes the payload to the Text type.

More or less, there’s a lot of things happening preventing compression from just happening in Phoenix channels.