How to wait for genserver start?

arpan · September 21, 2020, 3:29pm

Hi everyone, I have a gen server that is started as part of a supervision tree. The gen server is then called from another app which is a part of my umbrella project. Now my question is how do I know when the gen server has started and is ready to handle my gen server calls. On its init callback my gen server has to go over a directory of files and load the files, it might take some time. So when calling the gen server I face errors like…

iex(7)> [error] GenServer #PID<0.730.0> terminating
** (stop) exited in: GenServer.call(Csv2sql.Observer, :get_stats, 5000)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started

This is the code in my gen server, I have removed the parts which are irrelevant

defmodule Csv2sql.Observer do
  use GenServer

  @status_list [:pending, :infer_schema, :insert_schema, :insert_data, :finish]
  @stage_list [:waiting, :working, :validation, :finish]

  def get_stats do
    GenServer.call(__MODULE__, :get_stats)
  end

  def start_link(_) do
    GenServer.start_link(__MODULE__, :no_args, name: __MODULE__)
  end

  def init(_) do
    {files_map, files_to_process} = get_file_list()

    {:ok,
     %{
       start_time: DateTime.utc_now(),
       file_list: files_map,
       files_to_process: files_to_process,
       stage: :working,
       active_worker_count: Application.get_env(:csv2sql, Csv2sql.MainServer)[:worker_count]
     }}
  end

  def handle_call(
        :get_stats,
        _from,
        state
      ) do
    {:reply, state, state}
  end

  def get_file_list() do
    source_dir = Application.get_env(:csv2sql, Csv2sql.MainServer)[:source_csv_directory]

    source_dir
    |> File.ls!()
    |> Enum.reject(fn file ->
      extension =
        file
        |> String.slice(-4..-1)
        |> String.downcase()

      extension != ".csv"
    end)
    |> Enum.reduce({%{}, []}, fn file, {file_map, file_list} ->
      path = "#{source_dir}/#{file}"

      %{size: size} = File.stat!(path)

      file_struct = %Csv2sql.File{
        name: String.slice(file, 0..-5),
        path: path,
        raw_size: size,
        humanised_size: Sizeable.filesize(size),
        row_count: Csv2sql.ImportValidator.get_count_from_csv(path),
        status: :pending
      }

      {Map.put(file_map, path, file_struct), file_list ++ [path]}
    end)
  end
end

My gen server gets called from another app in the umbrella project like this…

        %{
          start_time: start_time,
          file_list: file_list,
          stage: stage,
          active_worker_count: active_worker_count
        } = Csv2sql.Observer.get_stats()

My supervision tree does not start automatically I start and stop it manually like…

   // Some other code...
    {:ok, sup_pid} = Csv2sql.Application.start(:no_args, :no_args)

    wait_for_finish()
    Supervisor.stop(sup_pid)
  end

  defp wait_for_finish() do
    Csv2sql.Observer.get_stage()
    |> case do
      :finish ->
        # Finish and stop supervisors after a second
        :timer.sleep(1000)

      _ ->
        wait_for_finish()
    end

My supervision tree code is like…

    children =
      []
      |> Kernel.++(repo_supervisor)
      |> Kernel.++([
        Csv2sql.Observer,
        Csv2sql.JobQueueServer,
        Csv2sql.DbWorkerSupervisor,
        Csv2sql.WorkerSupervisor,
        Csv2sql.MainServer
      ])

    opts = [strategy: :one_for_one, name: Csv2sql.Supervisor]
    Supervisor.start_link(children, opts)

I need to somehow wait for the gen server to start before making calls to it, the only solution I have found till now is to wait for some time like Process.send_after(self(), :tick, 2000) here tick will call the gen server later, I make the call after 2 seconds.
The other solution is I catch the error [error] GenServer #PID<0.730.0> terminating and request the gen server again and keep doing so until the gen server answers.

But I am unhappy with both the above solution, can anyone help me out ?

NobbZ · September 21, 2020, 3:32pm

Make the app that depends on the GenServer also depend on the app that holds the GenServer, then it will be started after the former has finished booting, and therefore the problem should vanish as a sideeffect.

arpan · September 21, 2020, 4:08pm

Thanks for replying. Let me explain my project better to give you a better understanding of the problem.

I have this umbrella project called Csv2sql which has two apps, one app is responsible for loading csv files into the database while the other app is a phoenix project. This other phoenix project uses phoenix live view to show a UI in the browser tracking the progress of csv loading operation(that is done by the first app).

This means that when the user clicks a button on the browser, the phoenix app will call the supervisor of the first app and will then start asking for progress from the gen server (of the first app, this is where I am stuck because the gen server might not be yet ready) , after the genserver loading the csvs is done it then automatically shuts down its supervision tree by Supervisor.stop(sup_pid). Again the user might click start and the whole process above is started.

That is starting and stopping the supervisor is done by the user on a button click, it is not automatic since I have removed this line from my mix.exs file…

  def application do
    [
      extra_applications: [:logger],
      # mod: {Csv2sql.Application, []} // avoid application start
    ]
  end

Any ideas? Was I able to explain the problem?

NobbZ · September 21, 2020, 4:16pm

Make the app that has the worker a dependency of the phoenix app, something like {:worker, :in_umbrella} in teh deps/0.

arpan · September 21, 2020, 4:23pm

Yes, I think I already have that…

In my phoenix app deps file…

  defp deps do
    [
      {:phoenix, "~> 1.5.1"},
      {:phoenix_live_view, "~> 0.13.0"},
      {:floki, ">= 0.0.0", only: :test},
      {:phoenix_html, "~> 2.11"},
      {:phoenix_live_reload, "~> 1.2", only: :dev},
      {:phoenix_live_dashboard, "~> 0.2.0"},
      {:telemetry_metrics, "~> 0.4"},
      {:telemetry_poller, "~> 0.4"},
      {:gettext, "~> 0.11"},
      {:jason, "~> 1.0"},
      {:plug_cowboy, "~> 2.0"},
      {:cachex, "~> 3.3"},
      {:csv2sql, in_umbrella: true} // here
    ]
  end

LostKobrakai · September 21, 2020, 4:33pm

start_link is synchronous, so your genserver is started when this returns.

arpan · September 21, 2020, 4:51pm

Thanks this is a good idea, but to make it work I have to remove the genserver from the supervision tree and manually call start link myself. I will definitely try this and inform if this works…

hauleth · September 21, 2020, 6:16pm

In supervision tree it uses start_link/1 as well. What you need to do is to use proper order in the children list.

arpan · September 21, 2020, 8:18pm

Yes you are correct. The docs says this abou the supervisor start_link

If the supervisor and its child processes are successfully spawned (if the start function of each child process returns {:ok, child} , {:ok, child, info} , or :ignore ) this function returns {:ok, pid} , where pid is the PID of the supervisor.

So, then why my genserver gives error not alive when sending it requests after starting the supervision tree like {:ok, sup_pid} = Csv2sql.Application.start(:no_args, :no_args)

I hope the Application.start method only returns once the supervision tree is started, otherwise how can it return the supervisor pid.

arpan · September 22, 2020, 6:21am

I think I found the problem, it was my mistake as expected…

Thank you everyone who replied for there help and patience.

Problem:
I was starting the supervision tree using Csv2sql.Application.start(:no_args, :no_args) but I was doing it in a separate task, its like…

Task.start(fn -> Csv2sql.Application.start(:no_args, :no_args) end)

So, the next lines of code were not waiting for the supervision tree to start, and trying to request the genservers before was therefore leading to errors.

Application.start, Supervisor.start_link are all synchronous that is they return only when the supervision tree has been started.

cehlts · September 24, 2020, 8:55am

I guess i am a bit late to the party…

You can always split the long-running stuff from GenServer.init into GenServer.handle_continue. The process should be up faster and the mailbox is ready to receive messages, which will be processed after the handle_continue.

https://hexdocs.pm/elixir/GenServer.html#c:init/1
https://hexdocs.pm/elixir/GenServer.html#c:handle_continue/2

jakemorrison · September 26, 2020, 3:26am

I am not a big fan of umbrella projects, for just this reason. Having everything in the same supervision tree solves this problem.

You can call Application.ensure_all_started/2 in the start/2 function of your application to make sure that another application is fully started.

arpan · September 26, 2020, 8:28am

Thanks a lot, the excellent medium post you shared actually helped me to solve an exact problem I was facing with handle_continue.

I was having a hard time dealing with race conditions due to the time taking init callback in my gen server, now I just set the gen server state with some empty values in the init callback and later use handle_continue to perform the heavy initialization part.

For reference, these were the 2 most important takeaways from the medium post for me:

sending yourself a message in the init/1 callback does not mean that it will be the first message in the mailbox .
handle_continue callback which was introduced in OTP 21, and guarantees that the process will not accept any messages until the callback is finished. This means that we can still have our asynchronous start up, without having to worry about other messages being processed first.

hauleth · September 26, 2020, 9:23am

But why? You can specify that given application depends on another in *.app and it will be handled by you by OTP.

jakemorrison · September 28, 2020, 5:54am

That is correct, if all the application dependencies are correct. But if they are not…

hauleth · September 28, 2020, 11:30am

Then you need to fix application definition.

jakemorrison · September 28, 2020, 11:46am

In my experience the Elixir side has dependencies specified correctly, but Erlang libraries may be missing something. Or there is some optional dependency which needs to be started. Fixing it is not so easy for a beginner, so Application.ensure_all_started can be helpful.