System.cmd significantly slower when called from handle_event vs mount

That’s not really useful in our case, but I appreciate the input.

It would be nice to enable tracing around those calls, and printout all sent/received messages from/to the current process. There might be some other message being processed while System.cmd is waiting for its return value. (this is just a guess at this point)

When you say “enable tracing around those calls” which calls are you referring to in particular, and what kind of tracing? I’m happy to do the leg work to hunt this down and get more data but I’m not sure how to proceed. Thanks!

The difference is in scanning the mailbox.

System.cmd uses selective receive while mailbox of the liveview process might be flooded with some info messages, events, or metadata messages

PS. I’ve run the module you provided and ls takes the same amount of time in both mount and handle_event (4-6ms)

As I said above, if the issue is the process mailbox flooded with “other stuff”, or any other form of “noisy neighbor” issue, then the performance should be consistently poor in all cases, which does not appear to be the case.

When you say you ran the above module, you specifically mean you tried get_files/0 from multiple contexts including handle_event and observed consistent performance across those contexts? If so, please provide the data and include the OS you are using. We have observed this issue on 3 different machines now, 2 Macs and 1 Linux box running Ubuntu.

Here’s a quick snapshot from the linux box:

Compiling 1 file (.ex)
[info] GET /perf
[debug] Processing with PerftestWeb.PerfLive.Index.index/2
  Parameters: %{}
  Pipelines: [:browser]
get_files from mount took 946 microseconds
[info] Sent 200 in 2ms
[info] CONNECTED TO Phoenix.LiveView.Socket in 19µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "OjcUIRo2BiAYNlkoTiE5C3YTLCZuABAyNvSB_fJVrNlncKkF4vvv6XQq", "_live_referer" => "undefined", "_mounts" => "0", "_track_static" => %{"0" => "http://localhost:4000/assets/app.css", "1" => "http://localhost:4000/assets/app.js"}, "vsn" => "2.0.0"}
[debug] MOUNT PerftestWeb.PerfLive.Index
  Parameters: %{}
  Session: %{"_csrf_token" => "tAGcEPLvjx5F-jRMBeZPXXAC"}
get_files from mount took 1081 microseconds
[debug] Replied in 2ms
[debug] HANDLE EVENT "refresh" in PerftestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
get_files from handle_event took 4660 microseconds
[debug] Replied in 5ms

Less extreme than what we saw on the Macs, but still 4x slower (in dev). Interestingly the difference is much smaller in prod on that linux box, but still noticeable (but just barely). It’s hard to tell what that means in this case though because the linux box is really beefy (we use it for ml tasks).

It’s just an unknown kind of weird, and that makes me nervous. Maybe there’s nothing to worry about and the performance in production will be fine. But I can’t reliably show that to be the case, and I feel I can fairly reliably demonstrate there is SOME issue, although I’m not exactly sure what. I’m looking at adding some tracing now with rexbug to see if I can get more data. I’ll report back with what I can find.

Can you just share your whole project and the browser you’re using? The issue is definitely not in the operating system.

No, that’s not what I meant. Perhaps you have 0 messages in mailbox during mount and a ton of messages in mailbox during handle_event

sure: GitHub - treshenry/phoenix_perftest

While the issue very likely isn’t the OS itself, it may well be how System.cmd is handled on different operating systems, so that could well be a factor in this case.

I’m also completely confused how there could possibly be “a ton of messages in mailbox” when running a completely empty Phoenix project with a single module on localhost in dev with 0 traffic and no load. But hey, what do I know! :man_shrugging:

Since System.cmd uses Ports internally I suspect that the receive block in cmd’s receive doesn’t play well with GenServers. Perhaps you could use Port directly and process the result in handle_info

Aha! Now that’s a theory that fits the facts. I’ll take a look at using Port and report back shortly. Thanks @krstfk!

I ran your project locally and I don’t see the performance differences you mention, the command executes in 2-5ms without any differences between on_mount and handle_event.

I am running ubuntu 22.04, OTP 25.3.2, Elixir 1.14.4

Can you paste the actual output?

There you go:

[info] CONNECTED TO Phoenix.LiveView.Socket in 67µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "Mxw3Jz45Fj4nOUoZNhgAL18pKyIRHWUjPDAiwIoKEQ9aijRdgkDCpdUz", "_live_referer" => "undefined", "_mounts" => "0", "_track_static" => %{"0" => "http://localhost:4000/assets/app.css", "1" => "http://localhost:4000/assets/app.js"}, "vsn" => "2.0.0"}
[debug] MOUNT PtestWeb.PerfLive.Index
  Parameters: %{}
  Session: %{"_csrf_token" => "cXvNIpyubhsx_rRK8Boaay0Y"}
shell_out took 3855 microseconds
[debug] Replied in 6ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 3293 microseconds
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 3000 microseconds
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 2941 microseconds
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 3416 microseconds
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 3235 microseconds
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 3182 microseconds
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 3562 microseconds
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 3031 microseconds
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 2220 microseconds
[debug] Replied in 2ms
[info] GET /perf
[debug] Processing with PtestWeb.PerfLive.Index.index/2
  Parameters: %{}
  Pipelines: [:browser]
shell_out took 2216 microseconds
[info] Sent 200 in 41ms
[info] CONNECTED TO Phoenix.LiveView.Socket in 76µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "ImwiDGQDK0FUGUEIOD8iPFEMNxVTLWMhA4TB-sR46q2pgMpwiNXt2TSx", "_live_referer" => "undefined", "_mounts" => "0", "_track_static" => %{"0" => "http://localhost:4000/assets/app.css", "1" => "http://localhost:4000/assets/app.js"}, "vsn" => "2.0.0"}
[debug] MOUNT PtestWeb.PerfLive.Index
  Parameters: %{}
  Session: %{"_csrf_token" => "cXvNIpyubhsx_rRK8Boaay0Y"}
shell_out took 2141 microseconds
[debug] Replied in 2ms
[info] GET /perf
[debug] Processing with PtestWeb.PerfLive.Index.index/2
  Parameters: %{}
  Pipelines: [:browser]
shell_out took 2292 microseconds
[info] Sent 200 in 6ms
[info] CONNECTED TO Phoenix.LiveView.Socket in 49µs
  Transport: :websocket
  Serializer: Phoenix.Socket.V2.JSONSerializer
  Parameters: %{"_csrf_token" => "GysPdyc9NTEhEBUeEQobEWB2XSYvGn4Axsy9nMLDCxffNxIZX42GNcNY", "_live_referer" => "undefined", "_mounts" => "0", "_track_static" => %{"0" => "http://localhost:4000/assets/app.css", "1" => "http://localhost:4000/assets/app.js"}, "vsn" => "2.0.0"}
[debug] MOUNT PtestWeb.PerfLive.Index
  Parameters: %{}
  Session: %{"_csrf_token" => "cXvNIpyubhsx_rRK8Boaay0Y"}
shell_out took 2457 microseconds
[debug] Replied in 2ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 2382 microseconds
[debug] Replied in 5ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 1970 microseconds
[debug] Replied in 2ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 2148 microseconds
[debug] Replied in 2ms

The browser and the plugins you’re using are important here. Perhaps your client generates some extra events

Well, doing the stupid version of this doesn’t help, which kinda makes sense now that I think about it as I’m basically doing the exact same thing at System.cmd (elixir/lib/elixir/lib/system.ex at v1.14.4 · elixir-lang/elixir · GitHub). I changed shell_out to:

  def shell_out() do
    p = Port.open({:spawn, "cat mix.exs"}, [:binary])

    receive do
      {^p, {:data, data}} -> IO.puts("Received data #{data}")
    end
  end

And left the timing code from mount and handle_event exactly the same.

[debug] MOUNT PtestWeb.PerfLive.Index
  Parameters: %{}
  Session: %{"_csrf_token" => "yWT637rNb1MlJQv4eoSvoJ_6"}
Received data ...
shell_out took 4498 microseconds from mount
[debug] Replied in 4ms

[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
Received data ...

shell_out took 23492 microseconds from handle_event
[debug] Replied in 28ms

But totally willing to chalk up this failure to my completely poor understanding of processes in Elixir.

Try to execute the cmd from a detached process, if that has the same performance issues then the issue is somewhere deeper in the stack:

defmodule PtestWeb.PerfLive.Index do
  use PtestWeb, :live_view

  def mount(_params, _session, socket) do
    shell_out()

    {:ok, socket}
  end

  def render(assigns) do
    ~H"""
    <button phx-click="refresh">refresh</button>
    """
  end

  def handle_event("refresh", _params, socket) do
    shell_out()

    {:noreply, socket}
  end

  def shell_out() do
    Task.start(fn ->
      {timer, _} = :timer.tc(fn -> {_output, _} = System.cmd("less", ["mix.exs"]) end)
      IO.inspect(timer)
    end)
  end
end

Haha, well hell. :man_shrugging: I have no idea then. At this point I think I just need to move forward and we’ll see how things fare in performance testing in production. Maybe it won’t reproduce there either and this is just somehow impacting only these three machines.

Here’s what I see (just copy/pasted your code)

[debug] MOUNT PtestWeb.PerfLive.Index
  Parameters: %{}
  Session: %{"_csrf_token" => "yWT637rNb1MlJQv4eoSvoJ_6"}
[debug] Replied in 72µs
2671
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
[debug] Replied in 4ms
16116

do you absolutely need to block until the result of the Port command is completely fulfilled?
I was more thinking along the lines of spawning in the handle_event and receiving in the handle_callback. This way the rest of the live_view would stay interactive and you could update the assigns/us when all the messages from the port have been received.
That’s just an idea that would maybe, at least, make the UX a bit better especially for commands that are slow running.

I meant handle_info callback.

This particular view has a handful of user inputs, and a chart as an output that is rendered in SVG based on the values returning from the executable. So while we don’t technically need to block on the result from the executable, the expectation would be that a change in input results in a fairly rapid change in the chart. This same executable is used in a native desktop application and achieves 30fps. Now, we know that that ain’t gonna happen on the web, which has nothing to do with anything Elixir, Phoenix, or browser rendering/dom patching. Our bottleneck, like all internet applications, is going to be the network. So that’s fine. We have reasonable expectations there. The concern is if we are arbitrarily eating cycles for unknown reasons due to our platform choice (which is sort of the worst case scenario). That’s REALLY hard to tell from where I’m sitting at the moment. So, like I said, I think we just barrel ahead and hope for the best. There’s not enough evidence to change tactics at this point.

I would just personally feel much better if I at least knew why I was seeing this, or that it was contained to just some dev machines.