Aggregate data across modules at compile time

For fun, I’m building an in-memory database library and the persistence mechanism is a GenServer for each table. When the application starts up, the library needs to spin up a GenServer instance for each table the user has defined in their application. To get these GenServers to start on application boot, the user could manually add them as children of their application supervisor, e.g.

defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    children = [
      MyApp.User.Repo,
      MyApp.Post.Repo,
      ...
    ]

    opts = [strategy: :one_for_one, name: Foobar.Supervisor]
    Supervisor.start_link(children, opts)
end

But I’m not very fond of this approach since it requires that the user must ensure that the list of GenServers started by the application supervisor matches the list of tables in their application—not to mention that I’d rather have the GenServers supervised by the database application instead of the user’s application.

So, I essentially want a way to be able to aggregate the list of all models the user has defined in their application at compile time and use that list in the database library’s application supervisor definition, like so:

defmodule Database.Application
  use Application

  def start(_type, _args) do
    children = [
      Database.SomeWorker
    ] ++ application_repositories() # [MyApp.User.Repo, MyApp.Post.Repo, ...]

    opts = [strategy: :one_for_one, name: Foobar.Supervisor]
    Supervisor.start_link(children, opts)
end

How can I accomplish this?

This might be done in many different ways (e. g. :application.get_key(:my_app, :modules),) but the whole approach seems not quite correct to me because it’s absolutely normal and expected for the GenServer to crash on occasion, and you’ll end up with inconsistent DB sooner or later. By putting some effort into making them anti-crash, you basically go counter-otp-idiomatic route.

Maybe ETS would be a more suitable approach? They also can crash, but at least they are not supposed to crash as a usual part of the happy path routine.

As a fellow library writer, the idea of introspecting the application at compile time to enumerate modules with certain characteristics was initially very appealing. But after diving down many rabbit holes I came to the conclusion that the community advise of “don’t do it” is largely well founded for the following reasons:

  1. Modules can and are created at runtime, so compile-time only isn’t necessarily a useful constraint. The BEAM excels at long running processes.

  2. Libraries can’t “call out” to an enclosing application and shouldn’t. In part because there may be many concurrent library consumers and many apps (in the OTP sense of app)

  3. At the time your library is compiled there can be no assumptions about what other modules are already compiled. And a library can’t - in any meaningful way - force compilation order. Which itself probably isn’t such a great idea.

So what can you do?

You could add your own “compiler” to the :compilers list in mix.exs that runs after all the other compilers, introspects all the modules in the system and builds a lit of known modules of the type you want.

Or … ask the user to configure the desired modules in config.exs (or its friends dev.exs, prod,exs and test,exs. This, I think, is the preferred approach. But I would still add in the capability to start a dynamically defined module too.

3 Likes

@mudasobwa This is a strictly for fun, learning experience sort of project. In any production system, I would absolutely reach for a proven tool like mnesia, ETS, etc.

@kip You make some good points! As part of my research into this issue, I looked into how the Elixir standard library does Protocol consolidation, and it involves searching through .beam files. Definitely not the sort of thing I’m aiming to do here.

Writing a custom “compiler” is certainly intriguing and I may experiment with this.

Having the user configure their tables in config.exs is also interesting, but I fear that an application with many tables would cause that file to become quite large. I don’t think that is any concern to the computer and may not even be a concern to a programmer whose editor has searching, but I still don’t feel quite right about it.

I also considered doing some trickery with Elixir’s Module’s @on_load attribute, but I don’t think there’s any way to ensure that the GenServer is ready to receive messages at the time that a module is loaded.

This solution won’t be available during compile time, but your example used a function at startup so maybe a combo of the two suggested solutions. A configurable ‘suffix’ in the config, to match db module modules like

## config.exs
...
   db_model_suffix: "Repo"
...

then you can match that in the application_repositories/0 function?

def application_repositories()
  suffix = config.get(:db_model_suffix) |> Atom.to_string
  :application.get_key('Elixir.MyApp', :modules) # Fetch the app atom
  |> Enum.map(&Atom.to_string/1)
  |> Enum.reduce(fn
      # Pattern match pseudo code
      "Elixir.MyApp." ++ suffix ++ _ = module, acc -> [module | acc]
      _, acc -> acc
    end, [])

Probably not quite what you want, but I happened across Kernel.Utils.announce_struct today when looking for something else: