I am using phoenix for websocket and http communication with clients.
Users can submit json messages which Phoenix converts to Elixir maps for me.
How can I easily and quickly check for and reject requests that are extremely large or complex so that the least amount of server cpu cycles and bandwidth are wasted?
My frontend http and web socket servers will need to forward the requests to backend processes on different nodes. So if a user floods large sized requests then bandwidth usage can cause congestion.
If you have a proxy/ballancer between world and your backend and you want to save CPU/bandwith on proxy (that is the better way) - you need to check request size on it - not on backend. If you don’t care about proxy - I’m not sure could it be accomplished on earler phases, but you can implement your custom websocket transport and check input message size before deserialization
You could do some validation on the client to check message size and whateverelse. This would elemenate a lot of failures. However, because of nature of JS, you would also want to implement your validations on the server to catch anyone trying to be sneaky. Something like the following should work.
def handle_in(msg, params, socket) do
case validate_params(params) do
:ok -> handle_message(msg, params, socket)
{:error, reason} -> notify_failure(reason)
end
end
defp handle_message("message 1", params, socket) do
do_message_1(params)
end
...
This does let the message come in and get deserialized first though. I’m not sure if that is acceptable in your use case.
Actually the deserialization (means convert from JSON to map) happens at transport layer (as I mention above) and if we receive a-very-big packet of JSON we will get unnecessary CPU usage to convert it into map - so checks should be at very earler stages, ideally - at ballancer.
UPD Of course, you should check user input too (validate_params is necessary here), but the question was about message size as I understand
As you said, you’ll have to check at your load balancer or singular HTTP server. There is no way to check in the app layer where the deserialisation has already happened.
The actual deserialization (from stream of bytes into Elixir map) happens here. I agree that checks should be placed before backend but in case you have no control on ballancer - it’s at minimum saves CPU load for JSON deserialization of unnecessary packets.
In any case the size-check should be before deserialization and the user-input-check after
That’s what I said. In my method, deserialization happens (the JSON is already an Elixir term). Yes this would waste cycles on larger payloads. However, with the validation, you wouldn’t waste cycles on invalid data (whether that be too large, incorrect shape, etc). I don’t know if it is the actual do_message_1 part that is also resource intensive, or they literally do not want to process large JSON inputs.
the handle_in callback called after the websocket transport decoding. And yes - the first argument for handle_in (the msg) is an elixir term, but it is a result of decoding byte-stream via Transport.decode!/2. So the size of packet of bytes should be checked there.
Or, better way as mentioned by @jmitchell - on cowboy level
I am agreeing with everything you have said. My option does not check size, as it is useless after the deserialization. Like I said, I don’t know if they wanted to literally reject a payload based on some size, or if they wanted to do some validations on the payload in order to prevent a resource intensive job from being started for known failures or something.
Very interesting answers, thank you everyone! So at loadbalancer/proxy level is best bet, but I’m not sure if the one I will be using eg. AWS or Digital Ocean can do that … unless I have another nginx at each node.
I guess doing both would be best.
I think maybe it would be good for phoenix to expose ability to set max request size for both websocket and http.
As for “complexity” of JSON object which might not correlate exactly to request size … I guess there isn’t really a way, other than validating afterwards.
Does websocket frame size mean the same thing as request size? I remember something about large requests being able to be separated into multiple frames, but I might be misunderstanding.
Maximum frame size allowed by this Websocket handler. Cowboy will close the connection when a client attempts to send a frame that goes over this limit. For fragmented frames this applies to the size of the reconstituted frame.
Best to test for the expected behavior before relying on it in production. Cheers.
EDIT: Is frame fragmentation different than multiple frames per message? I’m not sure.
The WebSocket message does not necessarily correspond to a particular network layer framing, as a fragmented message may be coalesced or split by an intermediary.
Implementations that have implementation- and/or platform-specific limitations regarding the frame size or total message size after reassembly from multiple frames MUST protect themselves against exceeding those limits. (For example, a malicious endpoint can try to exhaust its peer’s memory or mount a denial-of-service attack by sending either a single big frame (e.g., of size 2**60) or by sending a long stream of small frames that are a part of a fragmented message.) Such an implementation SHOULD impose a limit on frame sizes and the total message size after reassembly from multiple frames.
Section 5.2 shows that frames have a FIN bit which:
Indicates that this is the final fragment in a message. The first fragment MAY also be the final fragment.
Fields about payload length only pertain to one frame/fragment. To determine the length of the message the receiver must consume frames until reaching one where FIN == 1 and then sum the payload lengths of all those fragments. Given the concern raised in 10.4 about malicious endpoints, an implementation shouldn’t actually concatenate all the fragment payloads into a message if it would exceed the implementation-defined limit; otherwise, it would have to temporarily store an arbitrarily long sequence of potentially large payloads.