Rate limiting Ports

aseigo · October 4, 2017, 10:19am

This blog entry by @thdxr is an interesting read: Reading Named Pipes with Elixir

The conclusion is a little sad however:

While this does work, it is clear at this point Elixir isn’t the right tool for the job. As much as I love Elixir I was able to rewrite my script in Go in about 30 min and had it worked perfectly. Great reminder to make sure you have a wide set of options in your toolkit so you can use the right one for the job.

So naturally I wanted to see how difficult this would be to do more nicely in Elixir. It became my morning warm-exercise, actually. I started by looking to see if there was some hidden magic somewhere in Erlang’s Ports support that could be used/abused to get the desired behavior. Despite looking through the OTP repo, I couldn’t find anything useful. It’s a pretty thick forest in there though, so perhaps I missed something.

Giving up on the silver-bullet-in-the-API hope, I went ahead and implemented a possible solution:

github.com

aseigo/exploring-elixir/blob/back_pressure_port/lib/back_pressure_port.ex

defmodule BackPressurePort do
  @moduledoc """
    Provides a POC implementation of a GenServer that wraps a Port
    with message limiting, effectively providing back-pressure
    to the writers and avoiding memory exhaustion / backlog collapse
    on the Elixir side.
  """
  use GenServer

  require Logger

  def start_link(path, rate_limit \\ 100)

  def start_link(path, rate_limit) when is_binary(path), do: start_link(String.to_charlist(path), rate_limit)

  def start_link(path, rate_limit)
      when is_list(path) and is_integer(rate_limit) and rate_limit > 0
  do
    {:ok, pid} = GenServer.start_link __MODULE__, [path, rate_limit]
    GenServer.call pid, :open_port

This file has been truncated. show original

It works, but questions abound for me:

What other approaches exist?
Should something like this be more easy to do in Elixir?
If so, should it be in a hex.pm library somewhere (if so … which?), or is it something that should be possible with Port itself?
Or would erts be a better target for this, putting rate limiting right at the source?

So many questions, really interested in your answers …

josevalim · October 4, 2017, 10:36am

I wonder if nifsy can be used to read named pipes: https://github.com/antipax/nifsy

If it can’t, I would probably still say that the best solution is to use a NIF to provide the functionality of reading from a device on demand instead of eagerly. It is also worth pointing out that there is work happening on Erlang 21 and later to move towards NIFs for the file operations and the problem should be solved by (or more easily solvable) by then.

In any case, I also read the article yesterday and I am glad you are looking for solutions.

aseigo · October 4, 2017, 11:57am

It does! Tried with Nifsy.stream!/2, in addition to the regular open/read/close, and that works a treat … nice way to just spin until you end up with a backlog and then you can bail out. stream!/2 is line-oriented only, which I can imagine would be problematic when … well … the data is not broken into neat and tidy lines.

In any case, I didn’t know of this library, so thanks for the pointer. This is really interesting indeed, and gives a very simple way forward as it goes back to reader-control processing.

While certainly harder, now with dirty schedulers being the norm, this seems like a natural way forward. Looking forward to this!

josevalim · October 4, 2017, 1:09pm

This is not that hard to fix. I think Elixir own file streams can be read by bytes, so it would probably be a welcome contribution to nifsy to support byte reading there as well.

Maybe you should comment on the article about nifsy then? I am pretty sure it will help others looking at the same problem then.

EDIT: it may be even worth adding a note on nifsy own docs/readme that it can be used for named pipes.

aseigo · October 4, 2017, 2:18pm

I already pointed to this discussion here

Yes, it is just the stream mode and this is only because in the stream function they read it line by line. It would be quite trivial to introduce a byte buffer mode … I think I know what I may be doing this weekend

Good point, I will def include this in my PR …

aseigo · October 7, 2017, 5:54pm

Sooooo … good news / bad news …

Nifsy is super fast for reading files by blocks of bytes. But the read_line implementation is really quite slow. 8x slower than the equivalent python code on my machine … which isn’t great, as python isn’t exactly a speed demon either. wc -l puts them both to horrific shame

The upside is that it is implemented in a way that works with any type of file you can fopen (so, fifo’s e.g.), but for normal files it is anything but speedy. I need to do some measuring, but the likely culprits are the amount of memcpy and realloc operations, which are motivated by ErlNifBinary being an opaque structure mostly out of its control while not knowing where the newlines will be.

I need to try some experiments with replacing its use of a scratch buffer (essentially double buffering the whole file read process) with mmap. I suspect that may significantly improve performance just by allowing that extra buffer to be dropped.

(My measurements being done using a 253MB text file from wikipedia more than a 1/3rd of a million lines, from a relatively fast SSD, flushing caches before each run … in case anyone cares )

OvermindDL1 · October 9, 2017, 7:49pm

/me wonders if it might be useful to mmap a file and just treat it like a binary blob, unsure if that is even possible, not tried yet…

aseigo · October 10, 2017, 6:15am

Yes, that’s exactly what I was refering to … There would still be one copy necessary, however, as the mmap window would need to be movable and the binaries once passed into the BEAM need to be GC’d… so, one copy. But that should be a lot fewer memory operations than what is there now.