System.cmd significantly slower when called from handle_event vs mount

krstfk · May 10, 2023, 11:26am

Well, that could still be a better ui. As you receive messages through handle_info, update the state and trigger a render or stream it. I believe the issue with the receive/blocking approach is that the selective receive will have to weed out the system messages present in the mailbox and other messages present in there. With the handle_info approach you could update the ui as elements come from your external program.

LostKobrakai · May 10, 2023, 11:36am

If the number of messages on an hardly used LV page is a problem, then many of the systems out there wouldn’t work.

krstfk · May 10, 2023, 11:53am

I don’t think I made myself clear. I don’t believe it’s necessarily a problem, but as I understand it a receive in a gen_server will have to selectively weed out system messages, gen_server specific messages … I don’t see how that would not affect performances, even in the slightest.

That being said, I also believe that it’s not great to block in a callback when it’s possible to avoid it. Sure it would also be possible to wrap a System.cmd in a task and handle the result of the task. But then you would spawn a new process for the task and a Port.

I’m not claiming that performances would be very degraded by blocking, simply that it may make more sense not to block (and possibly improve the ui).

System.cmd wraps the whole Port machinery in a nice shortcut, where receives are handled for you, in a gen_server it might be better to use the primitives that are given to you.

benwilson512 · May 10, 2023, 11:56am

It only affects performance if in fact system messages in the queue. Why would there be in this case? It would take a LOT of messages to make receive performance measurably degrade, it is very well optimized.

josevalim · May 10, 2023, 12:05pm

Shot in the dark: what happens if you call :erlang.garbage_collect() before and after :timer.tc/1? My theory is that the garbage collector is being triggered during System.cmd because enough work has been done before handle_event (and that’s more likely to happen than in mount).

ohnoimdead · May 10, 2023, 12:15pm

And, just to add to that, this isn’t my first rodeo and I’m sensitive to premature optimization. You know what app is really slow? The one that didn’t get made because we exceeded budget optimizing crap that doesn’t matter.

There’s tradeoffs in all things. It’s just a bit scary with an intermittent 10x performance cost for reasons I can’t explain. But like I said, there’s not enough data to actually understand what is happening, and it’s apparently not a universal problem, so I’m just going to move on and we’ll see what happens in prod. I’ll definitely update this thread with any new info if it comes up.

ohnoimdead · May 10, 2023, 12:25pm

DANGIT! No dice (but I super thought that was it)

[debug] MOUNT PtestWeb.PerfLive.Index
  Parameters: %{}
  Session: %{"_csrf_token" => "yWT637rNb1MlJQv4eoSvoJ_6"}
shell_out took 3220 microseconds from mount
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 20130 microseconds from handle_event
[debug] Replied in 24ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 13889 microseconds from handle_event
[debug] Replied in 14ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 6235 microseconds from handle_event
[debug] Replied in 6ms

Here’s the code change, in case I screwed something up: Force garbage collection by treshenry · Pull Request #1 · treshenry/phoenix_perftest · GitHub

  def handle_event("refresh", _params, socket) do
    :erlang.garbage_collect()
    {timer, _} = :timer.tc(fn -> shell_out() end)
    :erlang.garbage_collect()

    IO.puts("shell_out took #{timer} microseconds from handle_event")

    {:noreply, socket}
  end

hst337 · May 10, 2023, 12:53pm

Can’t reproduce

[debug] MOUNT PtestWeb.PerfLive.Index
  Parameters: %{}
  Session: %{"_csrf_token" => "a1x_wxsgi4gy6NV-oOcG7TDt"}
shell_out took 2221 microseconds from mount
[debug] Replied in 2ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 2161 microseconds from handle_event
[debug] Replied in 5ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 4526 microseconds from handle_event
[debug] Replied in 4ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 1888 microseconds from handle_event
[debug] Replied in 2ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 1604 microseconds from handle_event
[debug] Replied in 1ms
[debug] HANDLE EVENT "refresh" in PtestWeb.PerfLive.Index
  Parameters: %{"value" => ""}
shell_out took 1598 microseconds from handle_event
[debug] Replied in 1ms

So what browser are you using?

hst337 · May 10, 2023, 1:02pm

By the way, I wouldn’t judge about anything related to performance in benchmarks with <100 runs and with +/- 600% deviation. But it is still very interesting to find out what causes such a behaviour

derek-zhou · May 10, 2023, 2:04pm

Can the sympton be reproduced by System.cmd("true", [])? true is the simplest external command that does nothing.

ohnoimdead · May 10, 2023, 2:49pm

Huh, did not know that! Sadly, still reproduces. As shown above, the issue isn’t the time it takes the OS to execute the command, but the time it takes Elixir to ask the OS to execute the command, then get the result.

ohnoimdead · May 10, 2023, 2:51pm

Great news! Hopefully it’s just something weird on a couple of local machines then.

benwilson512 · May 10, 2023, 3:02pm

Is there anything different about those local machines, eg the Elixir app is inside docker or virtualization?

stevensonmt · May 10, 2023, 3:24pm

FWIW, I am also not able to reproduce locally. Just did a few runs but seeing that mount is on par or a little slower than handle_event. Running Linux 6.3.1 on arch with Firefox 112.0.2 and vivaldi 6.0.2979.
Firefox

[debug] MOUNT PtestWeb.LsLive.Index
  Parameters: %{}
  Session: %{"_csrf_token" => "qjZyNlgmfueoXuFR_d2nzSGD"}
get_files took 3954 microseconds, on mount
[debug] Replied in 4ms
[debug] HANDLE EVENT "refresh" in PtestWeb.LsLive.Index
  Parameters: %{"value" => ""}
get_files took 3000 microseconds, on handle
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.LsLive.Index
  Parameters: %{"value" => ""}
get_files took 2384 microseconds, on handle
[debug] Replied in 2ms
[debug] HANDLE EVENT "refresh" in PtestWeb.LsLive.Index
  Parameters: %{"value" => ""}
get_files took 2703 microseconds, on handle
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.LsLive.Index
  Parameters: %{"value" => ""}
get_files took 2685 microseconds, on handle
[debug] Replied in 2ms
[debug] HANDLE EVENT "refresh" in PtestWeb.LsLive.Index
  Parameters: %{"value" => ""}
get_files took 3104 microseconds, on handle
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.LsLive.Index
  Parameters: %{"value" => ""}
get_files took 2019 microseconds, on handle

Vivaldi

[debug] MOUNT PtestWeb.LsLive.Index
  Parameters: %{}
  Session: %{"_csrf_token" => "zl31ZF6ilVtybyAu1AvFHvK_"}
get_files took 3376 microseconds, on mount
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.LsLive.Index
  Parameters: %{"value" => ""}
get_files took 2796 microseconds, on handle
[debug] Replied in 5ms
[debug] HANDLE EVENT "refresh" in PtestWeb.LsLive.Index
  Parameters: %{"value" => ""}
get_files took 2705 microseconds, on handle
[debug] Replied in 3ms
[debug] HANDLE EVENT "refresh" in PtestWeb.LsLive.Index
  Parameters: %{"value" => ""}
get_files took 2452 microseconds, on handle
[debug] Replied in 2ms
[debug] HANDLE EVENT "refresh" in PtestWeb.LsLive.Index
  Parameters: %{"value" => ""}
get_files took 2516 microseconds, on handle
[debug] Replied in 2ms
[debug] HANDLE EVENT "refresh" in PtestWeb.LsLive.Index
  Parameters: %{"value" => ""}
get_files took 2661 microseconds, on handle
[debug] Replied in 2ms
[debug] HANDLE EVENT "refresh" in PtestWeb.LsLive.Index
  Parameters: %{"value" => ""}
get_files took 2106 microseconds, on handle
[debug] Replied in 2ms

Just to give an idea of the variability on my machine, another run took 1634usec on mount and averaged ~2700usec on handle_event but then on refreshing the page mount took 5715usecs and handle_event averaged ~2200usec. So maybe just noise with low sample sizes?

I added :msacc.start() to the router function in lib/ptest_web.ex and then added :msacc.print() to the mount and handle_event functions after the get_files function is called in each. This didn’t seem to show much discrepancy in the two:

get_files took 6805 microseconds, on mount
Average thread real-time    : 209483891 us
Accumulated system run-time :   1632155 us
Average scheduler run-time  :    372646 us

        Thread      aux check_io emulator       gc    other     port    sleep

Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
       aux( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 5)    0.00%    0.00%    0.01%    0.00%    0.04%    0.00%   99.95%
dirty_io_s( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 7)    0.00%    0.00%    0.00%    0.00%    0.01%    0.00%   99.99%
dirty_io_s( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
      poll( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
 scheduler( 1)    0.01%    0.00%    0.09%    0.01%    0.08%    0.00%   99.81%
 scheduler( 2)    0.01%    0.00%    0.08%    0.01%    0.05%    0.00%   99.85%
 scheduler( 3)    0.01%    0.00%    0.10%    0.00%    0.10%    0.00%   99.79%
 scheduler( 4)    0.01%    0.00%    0.08%    0.00%    0.07%    0.01%   99.84%

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
           aux    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_sche    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.99%
          poll    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
     scheduler    0.01%    0.00%    0.09%    0.00%    0.07%    0.00%   99.82%

get_files took 2206 microseconds, on handle
Average thread real-time    : 215479101 us
Accumulated system run-time :   1701034 us
Average scheduler run-time  :    387481 us

        Thread      aux check_io emulator       gc    other     port    sleep

Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
       aux( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 5)    0.00%    0.00%    0.01%    0.00%    0.04%    0.00%   99.95%
dirty_io_s( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 7)    0.00%    0.00%    0.00%    0.00%    0.01%    0.00%   99.98%
dirty_io_s( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
      poll( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
 scheduler( 1)    0.01%    0.00%    0.10%    0.01%    0.08%    0.00%   99.81%
 scheduler( 2)    0.01%    0.00%    0.08%    0.01%    0.05%    0.00%   99.85%
 scheduler( 3)    0.01%    0.00%    0.10%    0.00%    0.10%    0.00%   99.79%
 scheduler( 4)    0.01%    0.00%    0.08%    0.00%    0.07%    0.01%   99.83%

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
           aux    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_sche    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.01%    0.00%   99.99%
          poll    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
     scheduler    0.01%    0.00%    0.09%    0.00%    0.08%    0.00%   99.82%

Again only a single run so might not be valid data.

At least on my machine handle_event seems to perform much more consistently than mount with the latter varying between 1400 and 7000 usec and the former between 1800 and 3200 usec.

ohnoimdead · May 10, 2023, 3:59pm

Also in reply to @benwilson512 above, it does look like there is something about these local machine environments that may be the cause. After these last two posts I went back to our beefy linux box, and there’s basically no difference there on second look and with a larger data set. Previously I didn’t grab a big sample size, because I just assumed if it repro’d twice, it would repro thrice. Bad assumption. I think I know what the issue is, or at least what field we are playing ball in.

Can anyone try to reproduce this with an M1 Mac? The two machines we are seeing crazy slowdowns on are Apple silicone. However, I happen to have an Intel mac we keep around for testing and it does not reproduce there. Consistently seeing ~4ms.

stevensonmt · May 10, 2023, 4:06pm

Did you do jose’s garbage collection trick on the M1 machines?

ohnoimdead · May 10, 2023, 4:15pm

Correct, and that didn’t resolve the issue. But here’s my new theory, which again, if someone else can repro on an M1, that would be super helpful: something about mount is eating the translation performance hit up front (maybe something prior is shelling out too, like maybe Tailwind compilation?), whereas handle_event is eating the hit when I’m calling System.cmd. Something along those lines. That might be wildly off, but I’m thinking the issue has something to do with System.cmd, in the handle_event path, specifically on Apple chips. Totally just a theory, but currently fits the facts.

If someone else has an M1, and can’t reproduce the issue, then I throw up my hands officially.

LostKobrakai · May 10, 2023, 4:25pm

Same results here it seems:

[debug] MOUNT ChannelsTestWeb.TestLive
  Parameters: %{}
  Session: %{"_csrf_token" => "v_1aco0pTl98KPuowfY6ThJz"}
System.cmd took 2675 microseconds
[debug] Replied in 2ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 6074 microseconds
[debug] Replied in 6ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 5035 microseconds
[debug] Replied in 5ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 7705 microseconds
[debug] Replied in 7ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 8343 microseconds
[debug] Replied in 8ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 6803 microseconds
[debug] Replied in 7ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 6116 microseconds
[debug] Replied in 6ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 15607 microseconds
[debug] Replied in 15ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 5405 microseconds
[debug] Replied in 5ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 7124 microseconds
[debug] Replied in 7ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 6561 microseconds
[debug] Replied in 6ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 6113 microseconds
[debug] Replied in 6ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 8810 microseconds
[debug] Replied in 9ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 7972 microseconds
[debug] Replied in 8ms
[debug] HANDLE EVENT "refresh" in ChannelsTestWeb.TestLive
  Parameters: %{"value" => ""}
System.cmd took 7700 microseconds
[debug] Replied in 7ms

ohnoimdead · May 10, 2023, 4:30pm

WOOT! OMG, thank you @LostKobrakai. Haha. I thought I was loosing my mind.

Great, now I can officially not worry about this as we will not be hosting the app on M1 in prod any time soon hah. I guess the remaining question is: should I file a bug?

Also, is there a good way to close this thread out and summarize in the original post or something? It doesn’t look like I can edit the post or the title… Forum newb here.

LostKobrakai · May 10, 2023, 4:35pm

You can mark a post as the solution. Even your own