Making a ssl socket smart about terminating char in messages

I have an ssl socket which receives messages in JSON format. Right now im using Erlang ssl to setup the connection. In my implementation handle_info is called when some data with a fixed size is available. However the JSON messages can be large so that handle_info is called multiple times with only parts of the message. I then do:
String.ends_with?(new_msg, "\n")
to put together whole messages.

This feels very slow, what I would like is to move this logic down and only have handle_info called when a complete message is available on the socket, that is when a message terminated with \n has arrived.

Is there any support for this in Erlang/Elixir?

:wave:

You can buffer the message in the process controlling the socket and only after it’s received in full, send to the process where handle_info is defined. Or you can add a message length header to your protocol (instead of \n) and receive only what’s needed with :ssl.recv(socket, message_length, timeout) – that would be the fastest approach (you’d need to open the socket with active: false and :raw options).

Thanks for the tips, right now im buffering, and its not my protocol so I cant change that. The issue I have is that checking ends_with? is O(n) in the length of the message, I was hoping there was some better way.

For stream-based sockets, including ssl, Erlang can do the buffering for you for a number of packet formats, including line protocols. The mode is controlled through the packet option: http://erlang.org/doc/man/inet.html#setopts-2.

So, to get messages from a TLS socket delivered at line break boundaries:
:ssl.connect('example.net', 443, packet: :line)

3 Likes

The docs says the following, Line mode, a packet is a line-terminated with newline, lines longer than the receive buffer are truncated. Does this not mean that I still need to buffer in my process since it is exactly the case where a line is longer then the receive buffer im interested in?

Right, there is an upper bound, but if you know the upper limit of the messages you expect to get you can customise it using :inet.setopts(s, buffer: @max_size).

If you want to be able to receive messages of arbitrary size, you’d still have to implement buffering in your application. But even in that case using packet: :line makes your job easier, because it guarantees that you won’t get multiple lines (of fragments of multiple lines) in a single packet. Using :raw mode that can happen, and you’d have to scan not just for a newline at the end, but also newlines inside a message: you’d have to do the entire reassembly into lines, both joining and splitting.

As for scanning for the newline at the end, if you’re using charlists it is indeed O(n), but if put the socket in binary mode and you use String.ends_with? it should be O(1).

Nope, it is still O(n), since elixir does also check if the full string is valid UTF-8.

If one does not care for UTF-8 validity of the string when checking for the final newline (remember, we could have a chunk split midst a multibyte codepoint anyway), then one can use :binary.last(input) == ?\n.

But does it? It is implemented as a combination of byte_size/1 and binary_part/3, and both are not UTF-8 aware…

Oh, then it has been optimized. Or was always. But I remember that @josevalim once said, all functions in String where checking for validity.

And in fact, in my opinion they should, and we should have another module working with pure binaries that does not check for string specifics, as erlang makes a different between the strings and lists module as well…

Oh, I agree, and I was going to suggest a real binary alternative when I decided to just check the source real quick anyway :slight_smile:

I dont know the exact max size but would there be any negative impact of just reserving a very large buffer like 1mb. I know nothing will be larger than that and most messages will be around 500chars.

As far as I know the buffer is not actually allocated, the value is just a threshold: when the socket is gathering data for delivery to the application, it has to give up at some point and notify the application. The memory is only used while data is queueing up, and released when the application is done with it.

I do think a large buffer size could be a DoS risk: if a server uses a 10M buffer for all incoming sockets, then a malicious client could send 9M of data with no newlines on a bunch of sockets and possibly eat up the server’s memory long before the socket limit (ports, file descriptors) is reached.

No, we don’t check. I believe the @moduledoc even says so. :slight_smile: