Persistent_term for static assets using Plug?

akoutmos · September 10, 2020, 9:06pm

I am curious if anyone has experimented with using persistent_term (persistent_term — erts v15.0) for static assets as opposed to reading from the file system? Obviously you would want to do this in production only so you are not constantly cleansing persistent_term during development. And you would want something in place to hydrate your persistent_term prior to starting your endpoint. But mostly curious if the idea is crazy or not.

I forked the Plug.Static plug and modified it to leverage persistent_term and the initial results looked good.

pera · September 10, 2020, 10:01pm

Nice results. I don’t serve static assets with Cowboy but I am using persistent_term to persist some computationally intensive function calls (mostly parsing and graph construction). I wrote the following helper macro for this purpose:

  defmacro defpersistent([{name, value}]) do
    atom = quote(do: {__MODULE__, __ENV__.function})
    function = Macro.var(name, nil)
    quote do
      def unquote(function) do
        case :persistent_term.get(unquote(atom), nil) do
          nil ->
            value = unquote(value)
            :persistent_term.put(unquote(atom), {value})
            value
          {value} ->
            value
        end
      end
    end
  end

I then have function declarations that look like this:

defpersistent function_name: expensive_computation()
def expensive_computation() do
  ...
end

voltone · September 11, 2020, 4:56am

Plug.Static, through Cowboy and Ranch, offloads the actual transmission of the data to the OS, using :file.sendfile/5. I imagine it has to do more work initially than your alternative, but depending on the file size and the achievable transmission rate it may actually be more efficient when the transfer takes longer.

It might be interesting to run some more tests with different file sizes and transmission rates (preferably on a real network interface). Maybe the optimal solution is a mix, serving small files from persistent_term while falling back to sendfile/5 for large files.

hauleth · September 11, 2020, 8:43am

IIRC there will be no difference as soon as the files are in memory. The difference would be only if the files are served from the FS, as then sendfile can work fully within kernel, without going through user space.

voltone · September 11, 2020, 8:50am

Isn’t that what we’re comparing here: Plug.Static serving up a file, versus an alternative that serves content from memory, in this case persistent_term?

akoutmos · September 11, 2020, 1:47pm

I’ll put together some more tests tonight with files larger than 1MB and see how things behave. I’ll also throw this on a separate server so it’s not on local host. Will report back with results!

hauleth · September 11, 2020, 1:56pm

Ahh, I missed that part. In such case Plug.Static with sendfile should be faster in almost all cases (assuming that the storage is quick).

voltone · September 11, 2020, 2:12pm

Well, if the socket can accept the entire file contents at once, without blocking, then it might be faster to just dumpt the data into the socket from memory, rather than do the system call dance with sendfile.

But if the file is larger than what the socket can handle instantaneously, causing it to block on write, then the BEAM as to go off and do other work, and come back later when the socket is ready for more. That is overhead that you don’t get with sendfile. This is why I suggested to test with different file sizes, and why testing over the loopback interface is not realistic, since it probably has a humongous TCP window size, so the socket will never block for writing.

derek-zhou · September 11, 2020, 2:19pm

Interesting! For purely static assets, we can use a front-end server like nginx. However there could be some quasi-static pages that change infrequently but are visited frequently, maybe we can pre-render them and stash them here?

al2o3cr · September 11, 2020, 3:11pm

Another potential performance pitfall with large files: getting the data from persistent_term to the socket is going to cost at least one copy of the data (from the BEAM to the kernel) where sendfile is only transmitting a “get the data from over there” message.