Put simple module in supervision tree?

Background

I have a module that reads information from a CSV file and puts it in memory. As with all modules that read things from files, there are a myriad of errors that can occur (file missing, corrupted file, permissions, etc).

This module is critical to the application. Without it, the application does nothing. I need to know of a way to startup the application while being able to test it’s startup.

Code

This is the code as I have envisioned:

defmodule Engine.Application do
  use Application

  #module that reads data from file and puts it into memory. Nothing special.
  alias Engine.Populator

  def start(_type, _args) do
    children = []
    opts = [strategy: :one_for_one, name: Engine.Supervisor]
    
    case Populator.init("super_duper_file.csv") do
      {:ok, :start_success} -> Supervisor.start_link(children, opts)
      {:error, :reason1} -> IO.puts("Fix it by doing something you dummy!")
      {:error, :reason2} -> IO.puts("We could do something, but I'm too lazy")
      err -> IO.puts("¯\_(ツ)_/¯")
    end
    
  end

end

However, this code can’t be tested. I can’t test what happens if Populator.init fails with :reason2 because to do it I need to create a test that invokes Engine.Application.start, which mix has already invoked to run said test (so it will always fail with :already_started error).

Question

So I take it this logic shouldn’t probably be here. Someone suggested I should turn this into something supervised, and that is what I am trying to do.

But I can’t find a logical way to turn a module that simply reads data from a file into something that is it’s own supervised process. I also want to have decent error messages instead of having the app just blow into my face.

How can I fix this?

You should be able to simply put it in the list of children: children = [Populator] and remove the whole case block. If the Populator’s init succeeds, the supervisor will ensure it keeps running, and if not the supervisor won’t start up. Then, you only need to test Populator.init/1 to ensure it returns the correct ok/error tuple based on the params it receives.

But I can’t find a logical way to turn a module that simply reads data from a file into something that is it’s own supervised process.

If by “simply reads data” you mean it’s simply a library of functions, then it doesn’t need to be supervised. If it has its own state (e.g. it’s a GenServer), then it’s already ready to be put in a supervision tree as above (the use statement will generate a child_specification function for you).

Finally, it appears you’re not designing your service as recommended: it should always be able to start (i.e. return an {:ok, _} tuple) even when nothing else works, and should return an {:error, _} only if it really cannot start up). From there, you can move up levels as more things work (e.g. the internal status goes from :not_ready_reason_1, to :not_ready_reason_2, to finally :ready and it will return {:error, :not_ready_reason_1 | :not_ready_reason_2} unless it’s status is :ready where it would return the result of the parsing). See also It's About the Guarantees

3 Likes

@david_ex’s answer is great, I’ll just add that you should probably have your Populator init function take an argument that lets you bypass doing any actual work. Then your supervision tree would look like:

[
{Populator, Application.get_env(:my_app, :populator_config)}
]

In test environment set populator_config to be :fake or whatever. BUT then in your test code you can call Populator.init(real_param) to do a real test.

2 Likes

You can pass --no-start flag to mix test if you don’t want your main application supervision tree started automatically

1 Like

Populator is in fact just a library of functions. It has no state nor is it a GenServer. This is what is confusing me. The application Engine depends on Populator to populate the memory, so I was recommended to put Populator into the supervision tree, but because Populator is a simple library of functions it is not clear to me how nor if I should do it.

If I had Populator inside the array of children then I need to make populator have a child_spec in compliance to what a Supervisor expects from it’s children. Right now Populator does not have a child_spec nor a start_link nor anything resembling what a Supervisor would expect from a child.

Yes, but how do I do it? I want to return a tuple {:error, :reason1} when something goes wrong but if Populator is just a simple library that is not supervised, how can I do this?

1 Like

What part of memory is populated? An ETS table? Who owns it? A GenServers state? This should be supervised then… Some Port? It has to have an owning process as well… And any of the owning processes should be supervised…

3 Likes

If it has to read a file on boot it does have state, even if that state is “the file exists”.

2 Likes

I am using a cool new feature called :persistent_term. No processes involved, no GenServers, no nothing, just pure, raw memory with incredible read access speeds. That’s why nothing is being supervised. There is no need.

This is confusing to me. I have always understood state as a variable a GenServer saves and keeps in memory while looping through its mailbox. Populator reads from a file is as simple as it can get. I use File.stream! to read it and then go process the data with Stream.map and Stream.filter and so on. It’s a little more complex than an Elixir Hello World example for reading files but that’s about it. I don’t store state nor anything.

Could you elaborate on this assertion? I am really interested in your POW on how you would attack this situation.

GenServers have two characteristics about them: #1 they can hold state, and #2 they’re the simplest OTP compliant process. Normally all the attention goes to #1, but #2 is relevant here when we’re talking about initializing something. If you want to initialize something, in particular :ets or :persistent_term at a specific point in your supervision tree, you need an OTP compliant process to do this.

In the case of :ets you then need that process to stick around so that it can own the table. In the case of :persistent_term you could just shut the process down after it’s initialized things, or leave it around so you can message something to reinitialize if you wish. Either way, we’re using a genserver more for it’s ability to be a good supervision tree citizen then we are for its long term state management.

3 Likes

So, if I understand correctly, your proposal is to have GenServer do the initialization (File.read and dumping data into :persistent_term). Read access is what I care, so client processes would still have direct access to :persistent_term and since I only populate memory once, the Populator process would then just stay there.

I would use a GenServer because I want Populator to be Supervision tree friendly.

I think I get it. I will surely have more questions in the future, but I am more convinced about using a GenServer now. Thanks for the answer !

1 Like

That’s exactly correct. There’s a bunch of handy features here. For example to support your test code:

def start_link(opts) do
  if Application.get_env(:my_app, :start_populator) do
    GenServer.start_link(__MODULE__, opts, [])
  else
    :ignore
  end
end

If start_populator is false (in your test.exs for example) then it just ignores it and doesn’t try to start it in the supervision tree. This lets you test error or success cases in your test cases themselves without causing issues with application start.

3 Likes