Determine call graph of module at compile-time

m1dnight · December 3, 2024, 9:13am

Hi,

I’m experimenting with macros to create a framework for defining a sort of tests.
I’m trying to figure out how to create a graph of function calls between these macros at compile-time but I’m unsure how to do it.

I have a macro defined as follows.

defmodule MyMacro do
  defmacro test(name, do: body) do
    quote do
      def unquote(name) do
        unquote(name)
      end
    end
  end
end

defmodule MyModule do
  require MyMacro
  import MyMacro

  test first() do
    :first
  end

  test second() do
    x = first()
    y = third()
  end

  test third() do
    :third
  end

What I would like to know, at compile-time, is which test definitions invoke which other test definitions.

So, the result would be that I can inject some module attributes that allow me to determine which test will call which test. In the example above this should be something like: invokes({MyModule, :second}) == [{MyModule, :first}, {MyModule, :third}]`.

Attempt using postwalk

What I have tried so far is using Macro.postwalk to traverse the AST in defmacro test, but in that context you only have access to the __CALLER__ env, and the AST of that particular test definition. With this information I can only determine which functions from other modules are imported and called, and which qualified functions are called. The issue I have here is that I don’t know when an unqualified invocation is actually another function in the module, or something else.

For example, the AST of the second() definition above looks as follows.

{:__block__, [],
 [
   {:=, [line: 10, column: 7],
    [{:x, [line: 10, column: 5], nil}, {:first, [line: 10, column: 9], []}]},
   {:=, [line: 11, column: 7],
    [{:y, [line: 11, column: 5], nil}, {:third, [line: 11, column: 9], []}]}
 ]}

__CALLER__ during macro expansion does not contain the other definitions in the module, so I cannot determine whether they are defined in this module or not.

Ideas

Another idea I had was to use the __after_compile__ callback to then do manual analysis on the module. In the __after_compile__ body you have access the an env that contains which module this invocation is about. Since the module is already compiled, I also have access to __info__(:functions) for that module.

Using this information, I can do a Macro.postwalk on the entire module definition and determine the calls inside each test.

The limitation here is that it’s no longer possible to inject code/module attributes in the module itself. So I don’t really know where to put the result of the analysis at this point.

defmodule MyMacro do
  defmacro __using__(_options) do
    quote do
      import unquote(__MODULE__)

      # module attribute that holds all the examples
      Module.register_attribute(__MODULE__, :test, accumulate: true)

      @after_compile unquote(__MODULE__)
    end
  end

  def __after_compile__(env, _bytecode) do
    module =
      env.module.__info__(:functions)
      |> IO.inspect(label: "functions in #{inspect(env.module)}")

    # analyze the compiled module to build a dependency graph
    analyze_module_here_using_the_env(env.module)

    IO.inspect(env)
  end

  defmacro test(name, do: body) do
    quote do
      # @test {unquote(name), unquote(body)}
      def unquote(name) do
        unquote(name)
      end
    end
  end
end

So in brief, how could I solve this in an elegant way? I’d like to not do these calculations at runtime, but have the information ready at compile-time.

jstimps · December 6, 2024, 2:19am

Unfortunately I don’t know the answer your questions directly, but your use case sounds very similar to an experimental project I worked on in Erlang.

GitHub | liet

It uses a parse_transform to inspect an Erlang module at compile time and constructs a dependency graph of all calls within the module. The goal being to treat functions of this particular module as resources and guarantee that each is only executed once for a given job, inspired by Terraform’s resource graph. (The name comes from Dune’s Liet Kynes )

Anyway, my project never found a use case, but perhaps it’s implementation can be useful.

m1dnight · December 6, 2024, 9:16am

Hah, I see your README mentions the use of eunit. This is actually my intention here too, somewhat. What you did is highly similar to what I want to do. At least I know it’s a valid approach.

I’ve made some progress on this in mean time.

I decided to stick to analysis within the defmacro because that way the actual environment is available when traversing the AST.

From this AST I can then gather a list of function calls made. The list of called functions in that AST is then attached via a tag to the function generated by the macro.
Next to that, each macro invocation is put into an accumulative attribute.

  defmacro __using__(_options) do
    quote do
      import unquote(__MODULE__)

      # module attribute that holds all the examples and their called functions
      Module.register_attribute(__MODULE__, :example_dependencies, accumulate: true)
      Module.register_attribute(__MODULE__, :examples, accumulate: true)
    end
  end

  defmacro example({func_name, context, args} = name, do: body) do
    called_functions = Analyze.extract_function_calls(body, __CALLER__)


    quote do
      @example_dependencies {unquote(func_name), unquote(called_functions)}
      @examples unquote(func_name)
      def unquote(name) do
         # code here
      end
    end
  end

At runtime there is a list of macro invocations available, as well as a list of all function invocations within each macro. The only thing that’s left to compute at runtime is extracting the function calls to functions that are defined by the macros. That’s a reasonable trade-off to make.

josevalim · December 6, 2024, 9:37am

If you want to enable it generally for code compilation, you can use compilation tracers: Code — Elixir v1.18.0-dev

The boundary project uses them to collect all kinds of information. Otherwise you can use beam_lib to access the abstract code and traverse it to find calls.