:observer apps gui freezes when a process calls Task.async in init

Hello,
I have some GenServers that will do some loading during init function using Task.async (so they will not block init and then get the result via handle_info function).

Recently I noticed that for the period that this task is running, the :observer Applications tab, with the app tab in which the caller server is supervised, will freeze and not respond until the task finishes its job. I have other GenServers that start Tasks for other stuff and they don’t show the same behavior.

I also get some errors from appmon during that period:

12:48:14.862 [error] #PID<0.741.0> :gen_server "error_info/7" "gen_server.erl" 889 
        GenServer :appmon_info terminating
** (MatchError) no match of right hand side value: {:error, {:EXIT, {:badarg, [{:erlang, :element, [2, :undefined], []}, {:appmon_info, :check_sasl_ancestor, 2, [file: 'appmon_info.erl', line: 618]}, {:appmon_info, :maybe_add_child_sasl, 4, [file: 'appmon_info.erl', line: 590]}, {:lists, :foldr, 3, [file: 'lists.erl', line: 1276]}, {:appmon_info, :do_find_proc2, 5, [file: 'appmon_info.erl', line: 497]}, {:appmon_info, :calc_app_tree, 2, [file: 'appmon_info.erl', line: 450]}, {:appmon_info, :do_work, 2, [file: 'appmon_info.erl', line: 308]}, {:appmon_info, :handle_info, 2, [file: 'appmon_info.erl', line: 276]}]}}}
    appmon_info.erl:308: :appmon_info.do_work/2
    appmon_info.erl:276: :appmon_info.handle_info/2
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:do_it, {:app, :pipeline, #PID<0.729.0>}}
State: {:state, #PID<0.729.0>, [], #Reference<0.3006373769.2547122179.177556>, [#PID<0.729.0>]}
 
12:48:14.863 [error] #PID<0.741.0> :proc_lib "crash_report/4" "proc_lib.erl" 508 
        Process :appmon_info (#PID<0.741.0>) terminating
** (MatchError) no match of right hand side value: {:error, {:EXIT, {:badarg, [{:erlang, :element, [2, :undefined], []}, {:appmon_info, :check_sasl_ancestor, 2, [file: 'appmon_info.erl', line: 618]}, {:appmon_info, :maybe_add_child_sasl, 4, [file: 'appmon_info.erl', line: 590]}, {:lists, :foldr, 3, [file: 'lists.erl', line: 1276]}, {:appmon_info, :do_find_proc2, 5, [file: 'appmon_info.erl', line: 497]}, {:appmon_info, :calc_app_tree, 2, [file: 'appmon_info.erl', line: 450]}, {:appmon_info, :do_work, 2, [file: 'appmon_info.erl', line: 308]}, {:appmon_info, :handle_info, 2, [file: 'appmon_info.erl', line: 276]}]}}}
    appmon_info.erl:308: :appmon_info.do_work/2
    appmon_info.erl:276: :appmon_info.handle_info/2
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Initial Call: :appmon_info.init/1
Ancestors: [#PID<0.729.0>, :observer, #PID<0.711.0>, #PID<0.82.0>]
Message Queue Length: 0
Messages: []
Links: [#PID<0.729.0>]
Dictionary: []
Trapping Exits: true
Status: :running
Heap Size: 10958
Stack Size: 27
Reductions: 115365

For completeness, this is more or less what I have:

defmodule Server do
  ...
  
  @impl GenServer
  def init(args),
    do: {:ok, State.new(args), {:continue, :load_pipeline}}

  @impl GenServer
  def handle_continue(:load_pipeline, state) do
    Task.async(fn ->
      repo = Application.get_env(:database, :bla) |> Keyword.fetch!(:repo)

      repo.transaction(
        fn repo ->
          # Do some work with streams that can take some time
        end,
        timeout: :infinity
      )
    end)
    
    {:noreply, state}
  end

  ...
end

Can I do something to help fix this issue?

Thanks.

Why not remove the Task.async? Since it’s in a handle_continue, it won’t block the init.

Because I need to be able to handle external calls during the work I’m doing inside the Task.
I do other work inside the handle_continue that I didn’t show in the code above because it would not be related to the issue. I did try moving the Task to the init call and it still shows the same issue.
I never had it before, so I’m not sure if I’m doing something wrong or this is a bug with :observer.

I don’t know enough about the observer internals to know for sure, but:

  1. instead of Task.async use Task.start or Task.start_link, async returns a future which it seems like you don’t need (there’s no corresponding Task.await).

  2. instead of Task, i would spawn inside an otp supervisor tree, so Task.Supervised.start or Task.Supervised.start_link

Doing these two things will make your architecture more otp-compliant and possibly help observer out? But again, I am not an expert on observer, so maybe someone else more experienced can comment.

1 Like

As @ityonemo said, you are not doing Task.await to correspond with your earlier Task.async. You’re better off just doing one action then send yourself (i.e. the current process) a message that does the next, etc. until your initialisation is completed.

Thanks for the replies @ityonemo and @dimitarvp.

I’m not doing a Task.await because this is running inside a GenServer, and as the documentation says in https://hexdocs.pm/elixir/Task.html#await/2-compatibility-with-otp-behaviours I don’t need a Task.await (the documentation goes as far as saying it is not recommended), it will trigger my matching handle_info call.

I think you are doing it wrong. Let OTP handle itself. :wink: Just don’t use Task.async inside handle_continue at all. It will simplify your code and the app should behave properly. However, I have no background in your challenges – or how many other initialisation activities you need.

Basically I need to load somethings for that process in parallel to receiving call messages, I cannot block the server and wait it to load because of some constrains of my project.

I created a small project just to test this issue in isolation, and I cannot reproduce the issue, basically I just created an empty project (with --sup option), added the bellow GenServer and added it to the application supervisor.

defmodule Server do
  use GenServer
  
  def start_link(args) do
    GenServer.start_link(__MODULE__, args)
  end

  def init(_args) do
    Task.async(fn ->
      Process.sleep(60_000)
      :done
    end)

    {:ok, :no_state}
  end

  def handle_info({_, :done}, state) do
    IO.puts("DONE")
    {:noreply, :no_state}
  end

  def handle_info({:DOWN, _, :process, _, :normal}, state), do: {:noreply, state}
end

This didn’t freeze my :observer.

I think that actually the issue is not the Task.async, but maybe the repo.transaction I’m doing inside of it. I will try simulate something similar in my test project and see if I can reproduce the issue.

So, turns out that the issue was processing the data I’m loading, I do some computations inside an Enum.reduce and this can take some time (it takes ± 5 minutes to process 2kk rows of data). When the process is doing that the observer will freeze.

You can likely chunk and multiplex the work on all CPU cores.