Order of evaluation of macro context

woojiahao · July 26, 2021, 4:41am

Hi, I am trying to understand the order of evaluation of the macro context, specifically, whether it occurs before or during expansion, specifically the expansion phase of the compilation process.

Metaprogramming Elixir states that “Before the macro was expanded, we entered the definfo context on line 3.” (section “Injecting Code”), which gives me the impression that it is meant to occur before.

However, using Macro.expand_once on the AST of any macro invocation will cause the macro context to evaluate, which causes ambiguity as to whether the evaluation is occurring before or during the expansion.

According to Understanding Elixir Macros (Part 6), the macro context is evaluated when the macro is invoked. What does this mean exactly? Does it mean that when we expand the AST of a macro invocation, the compiler treats it as invoking the macro first (prior to expansion) and thus, runs the macro context. Then, the macro expansion occurs?

Thank you in advance!

kip · July 26, 2021, 5:54am

While not directly answering your question, perhaps the following helps understand the order of evaluation:

A macro is a function that both receives, and is expected to return, AST. So in that sense a macro is just a function and there is nothing particularly special about it except that it is expected to return AST.
Returning AST is normally achieved with a quote block. All a quote block does is convert code into AST.
Sometimes we want to insert compile-time values into the generated AST. We do that with an unquote. An unquote is evaluated when the macro is called and is therefore a compile-time construct. Its scope is the macro scope so it has only access to the parameters passed to the macro in addition to the special forms __CALLER__ and __ENV__. In this case __ENV__ refers to the macro environment (not the calling environment).
Sometime we want to defer the unquote to the expansion phase after the macro’s returned AST is injected into the calling site. In this case we use quote unquote: false. Or more commonly quote bind_quoted: [....] which also implies quote unquote: false. This would be required if the unquote refers to a term that is in the callers scope, not the macro scope.

The more you think of a macro as a function that takes and returns AST the easier it becomes to reason about when things happen. The “tricky” bit is recognising when unquote is going to be expanded because sometimes you want it expanded in the macro context and sometimes in the caller context. Therefore getting familiar with quote bind_quoted: [....] or quote unquote: false is very helpful.

There is one other important area which is macro hygiene but we can build on that later.

If you have an example to post I’d be happy to explain further in case this post just added to the confusion

stefanchrobot · July 26, 2021, 7:03am

@kip I found your explanation really useful. Can you please give an example of case 3 vs 4?

kip · July 26, 2021, 7:16am

Here’s one example where (4) is very useful. ie Defering unquote to an expansion phase after the macro is inserted in the calling place.

This example applies two concepts:

Module attributes are evaluated after macro expansion. Which means you can’t evaluate them in a macro. So they need to be evaluated after the AST returned by the macro is inserted into the caller.
Special forms are also macros. So we need to help the compiler understand when to expand an unquote - in the macro or in the caller.

defmodule MyMacros do
  @moduledoc """
  Define a functions based upon a list of values
  """
  defmacro with_vals(values) do
    quote bind_quoted: [values: values] do
      # `for` introduces its own macro context in which
      # we want to unquote. So the unquote needs to be
      # deferred until a later expansion in the calling
      # site.  `bind_quoted: [values: values]` means that
      # referring to `values` in the quote body is automatically
      # unquoted. And any `unquote` is *not* evaluated in the
      # macro context but will be evaluated *after* the
      # returned AST is inserted into the calling site.
      for val <- values do
        def n(unquote(val)), do: unquote(val + 2)
      end
    end
  end
end

defmodule MyModule do
  require MyMacros
  @all_vals [1,2,3,4,5,6]

  # defines 6 function clauses for the
  # function `n`.
  MyMacros.with_vals(@all_vals)
end

In summary, here is a case where the entire for expression - including the unquote(...) is inserted unmodified into the calling site. Since for, def, defmodule are all themselves macros, its completely ok to unquote in their contexts. These cases are referred to as unquote fragments

woojiahao · August 1, 2021, 9:29am

Hey @kip , so sorry for the delayed response, I was away for a couple of days.

Thank you so much for your answer! It actually helped me to solve my conundrum. After you emphasized how I should think about macros as regular functions, everything clicked.

This was the final explanation I had come up with (bits are extracted from the article I am working on):

The compilation process can be broken down into:

Parsing phase - Elixir source code (program) is parsed into an AST, which we will call the initial AST
Expansion phase - initial AST is scanned and macro calls are identified
Macros are executed so that their output (AST) can be injected into and expanded at the callsite
Expansion occurs recursively and a final AST is generated
Bytecode generation phase - after the final AST has been generated, the compiler performs an additional set of operations that eventually generates BEAM VM bytecode which is then executed

Macros contain two contexts: a macro context and a caller context.

defmacro foo do 
  # This is the macro's context, this is executed when the macro is called 

  # This is the return value of the macro (AST) 
  quote do 
    # This is the caller's context, this is executed when the callsite is called 
  end 
end

As you can see, the macro’s context is any expression declared before the quote. The caller’s context is the behavior that is declared in the quote; the AST generated from quote is the output of the macro and is the AST that is injected into and expanded at the callsite, hence why it is referred to as the caller’s context.

First, let’s review what we understand about macros. They are a compile-time construct that receives ASTs as input and returns ASTs as output. Aside from these two attributes, they behave just like any other function.

Knowing this, we can move on to the first component of the expansion phase, “Macros are executed so that their output (AST) can be injected into the callsite and expanded at the callsite”. In order for the compiler to know what AST needs to be injected into the callsite, it has to retrieve the output of the macro (since it is an AST). In order to retrieve the output of the macro, it has to be executed - just like any other function; and this process has to be done during compilation as that is when the expansion phase occurs. The macro call will be parsed as an AST during the parsing phase and the compiler identifies and executes these macro call ASTs during compilation, prior to the expansion phase.

If we think of macros as regular functions, the macro context will be the function body and the caller context will be the result of the function. Understanding this, the behavior exhibited above makes sense. During compilation, a macro has to be executed so that its result can be retrieved. This causes the macro context to be evaluated during compilation. Once the macro has been evaluated, the caller context is injected into and expanded at the callsite of the macro. This explains why the macro context and caller context have different “owners”. As the macro context is evaluated during compile-time and treated as a regular function body, it is executed within its containing module. Thus, it is “owned” by its containing module. The caller context is injected into the callsite and evaluated whenever the callsite is evaluated, thus, it is “owned” by the caller.

TL;DR Macro context is evaluated before the formal expansion phase of the compiler. This is so that the output AST of the macro can be injected into and expanded at the callsite during the actual expansion phase

Once again, thank you so much!