Making a ssl socket smart about terminating char in messages

favetelinguis · September 8, 2018, 9:21am

I have an ssl socket which receives messages in JSON format. Right now im using Erlang ssl to setup the connection. In my implementation handle_info is called when some data with a fixed size is available. However the JSON messages can be large so that handle_info is called multiple times with only parts of the message. I then do:
String.ends_with?(new_msg, "\n")
to put together whole messages.

This feels very slow, what I would like is to move this logic down and only have handle_info called when a complete message is available on the socket, that is when a message terminated with \n has arrived.

Is there any support for this in Erlang/Elixir?

idi527 · September 8, 2018, 11:23am

You can buffer the message in the process controlling the socket and only after it’s received in full, send to the process where handle_info is defined. Or you can add a message length header to your protocol (instead of \n) and receive only what’s needed with :ssl.recv(socket, message_length, timeout) – that would be the fastest approach (you’d need to open the socket with active: false and :raw options).

favetelinguis · September 8, 2018, 11:32am

Thanks for the tips, right now im buffering, and its not my protocol so I cant change that. The issue I have is that checking ends_with? is O(n) in the length of the message, I was hoping there was some better way.

voltone · September 9, 2018, 5:18am

For stream-based sockets, including ssl, Erlang can do the buffering for you for a number of packet formats, including line protocols. The mode is controlled through the packet option: http://erlang.org/doc/man/inet.html#setopts-2.

So, to get messages from a TLS socket delivered at line break boundaries:
:ssl.connect('example.net', 443, packet: :line)

favetelinguis · September 9, 2018, 9:39am

The docs says the following, Line mode, a packet is a line-terminated with newline, lines longer than the receive buffer are truncated. Does this not mean that I still need to buffer in my process since it is exactly the case where a line is longer then the receive buffer im interested in?

voltone · September 9, 2018, 11:39am

Right, there is an upper bound, but if you know the upper limit of the messages you expect to get you can customise it using :inet.setopts(s, buffer: @max_size).

If you want to be able to receive messages of arbitrary size, you’d still have to implement buffering in your application. But even in that case using packet: :line makes your job easier, because it guarantees that you won’t get multiple lines (of fragments of multiple lines) in a single packet. Using :raw mode that can happen, and you’d have to scan not just for a newline at the end, but also newlines inside a message: you’d have to do the entire reassembly into lines, both joining and splitting.

As for scanning for the newline at the end, if you’re using charlists it is indeed O(n), but if put the socket in binary mode and you use String.ends_with? it should be O(1).

NobbZ · September 9, 2018, 11:55am

Nope, it is still O(n), since elixir does also check if the full string is valid UTF-8.

If one does not care for UTF-8 validity of the string when checking for the final newline (remember, we could have a chunk split midst a multibyte codepoint anyway), then one can use :binary.last(input) == ?\n.

voltone · September 9, 2018, 1:09pm

But does it? It is implemented as a combination of byte_size/1 and binary_part/3, and both are not UTF-8 aware…

github.com

elixir-lang/elixir/blob/v1.7.3/lib/elixir/lib/string.ex#L2000-L2018


      
          def ends_with?(string, suffix) when is_binary(string) and is_binary(suffix) do
            ends_with_string?(string, byte_size(string), suffix)
          end
          
          def ends_with?(string, suffix) when is_binary(string) and is_list(suffix) do
            string_size = byte_size(string)
            Enum.any?(suffix, &ends_with_string?(string, string_size, &1))
          end
          
          @compile {:inline, ends_with_string?: 3}
          defp ends_with_string?(string, string_size, suffix) when is_binary(suffix) do
            suffix_size = byte_size(suffix)
          
            if suffix_size <= string_size do
              suffix == binary_part(string, string_size - suffix_size, suffix_size)
            else
              false
            end
          end

NobbZ · September 9, 2018, 1:15pm

Oh, then it has been optimized. Or was always. But I remember that @josevalim once said, all functions in String where checking for validity.

And in fact, in my opinion they should, and we should have another module working with pure binaries that does not check for string specifics, as erlang makes a different between the strings and lists module as well…

voltone · September 9, 2018, 1:39pm

Oh, I agree, and I was going to suggest a real binary alternative when I decided to just check the source real quick anyway

favetelinguis · September 9, 2018, 1:49pm

I dont know the exact max size but would there be any negative impact of just reserving a very large buffer like 1mb. I know nothing will be larger than that and most messages will be around 500chars.

voltone · September 9, 2018, 2:14pm

As far as I know the buffer is not actually allocated, the value is just a threshold: when the socket is gathering data for delivery to the application, it has to give up at some point and notify the application. The memory is only used while data is queueing up, and released when the application is done with it.

I do think a large buffer size could be a DoS risk: if a server uses a 10M buffer for all incoming sockets, then a malicious client could send 9M of data with no newlines on a bunch of sockets and possibly eat up the server’s memory long before the socket limit (ports, file descriptors) is reached.

josevalim · September 9, 2018, 2:20pm

No, we don’t check. I believe the @moduledoc even says so.