Other languages generating code like Elixir does

I think so, yes. I haven’t read into the compiler though, but a quick skim through the code reveled the following function https://github.com/elixir-lang/elixir/blob/725a36f14d4590972e1efade1978ab3c428585a8/lib/elixir/src/elixir.erl#L288-L291 which seems to be doing exactly what you’ve described. It might not be called on actual .ex files, but only from command line (judging by the name), but I’d guess the logic would be the same.

In languages compiled to the binary executable and without use of “macros” I think it would be only available in D. However if we allow usage of separate languages then we can use C preprocessor to generate functions in C/C++ during compilation. In Rust we can use macros to generate functions or attributes to do so (for example check Serde).

Nice link, but I do not see where Elixir executes the whole output of the macro expansion phase, which corresponds to the result of executing,

    kv = [foo: "called foo", bar: "called bar"]
    Enum.each kv, fn {k, v} ->
      ...
    end

This must be done before translating AST to Erlang code. When?

It probably happens before the function I’ve linked (or something similar) is called, but it would most likely have the same logic.

https://medium.com/@fxn/how-does-elixir-compile-execute-code-c1b36c9ec8cf

1 Like

What you’re doing there is using the unquote fragment (docs) functionality, which is technically different to what a macro does. It does not convert AST -> AST, but generates functions based on compile time values. Sadly both macros and unquote fragments use unquote(), which makes is in my opinion quite unintuitive to users that there are two different usecases for unquote, which even clash with each other if you try to use unquote fragments within a macro.

2 Likes

AFAIK Elixir is never translated to Erlang code but directly to the Erlang VM binary code. And example you have given will not work, as you need to use Each.map instead. Just think about macros as functions that return AST. Other than that, everything is interpreted as any other function.

defmodule MyDef do
  defmacro my_def(name, value) do
    {:def, [], [{name, [], []}, [do: value]]}
  end
end

defmodule Main do
  require MyDef
  
  MyDef.my_def(:foo, "bar")
end

IO.puts Main.foo
1 Like

That’s not correct: [Elixir] elixir 1.7.1 - Wandbox and it probably correlates to my previous post and unquote fragments not working like macros.

1 Like

I never know when use which one, that is why I always use for which is more like Enum.map rather than Enum.each so maybe that is the source of the confusion.

The unquote function does only one thing, which is to evaluate expressions in the context of its surrounding quote. The case of the two different use cases you mention is an illusion caused by the fact that def (and the like) expand in a way that the unquote will evaluate in a context where k and v are bound.

Moving a compile time value from outside def to “inside” it => unquote fragment
Moving AST from outside quote do into it’s body => macro

Those are imo to different usecases. One works with AST and the other with bound compile time values. The fact that they can clash if using unquote fragments within macros and you need to work around needing to use both together it in my opinion a clear sign that those should’ve been differently named.

This is the closest explanation I have seen to what Elixir is doing when compiling code, but it is not much clear as it somewhat ends abruptly

There is some nesting in this process that explains the loop illustrated in the picture above. This is due to the way module definition is implemented, but we’ll leave it here.

But it indeed shows that an Elixir file is a program to generate another program, the latter of which is the definite program that will be executed by the BEAM.

Thanks for the link.

1 Like

Definitely two use cases of unquote, but this function always behaves the same, i.e., it does not do anything special depending where it is used.

Actually def ... do is a quote block, defmodule actually executes it’s body after some minor translations (blehg special forms). ^.^

Rust, C++ to an extent, D, racket (even typed racket), etc…

Specifically the syntax is parsed, then it’s run over to generate AST, then each AST node starting with the most-parent and inwards is looked at, if it’s AST it’s kept, if it’s code it’s executed to be replaced with AST in-position (evaluated in the above example, not compiled), and that’s repeated until all that is left is just raw ast or invalid structure (error).

def is a full wrapped quote expression (hence why unquote works), and that returns an AST to the def internal compiler call.

defmodule actually takes it’s executed body and basically ignores it’s return value, what it does is setup a ‘module compilation’ state that when each def is executed then it pushes it’s function ast definition into that module state definition. There are lots of Module.* commands like that.

3 Likes

So, what you are saying seems to be this:

  1. Parsing generates a meta-AST comprising normal-AST and executable-AST nodes;

  2. This meta-AST is then traversed (I suppose depth-first), and the executable-AST nodes are executed by the Elixir interpreter, thus producing normal-AST nodes which replace the executable-AST node that was executed;

  3. Then, after interpreting all executable-AST nodes, all that exists is a typical AST (containing only normal-AST nodes);

  4. Then, this AST gets translated into some Erlangish result.

Is this correct?

The ‘executable’ nodes are just unquote’s, like {:unquote, [], [{:+, [], [2, 2]}]} well then run the interpreter over {:+, [], [2, 2]} (which is just 2+2) and the interpreter will return 4 thus all of {:unquote, [], [{:+, [], [2, 2]}]} gets replaced with 4, and since 4 is valid as an AST node (and it’s not further unquotes for example) then it continues on.

Yep.

Depends on where it ‘is’. Like the code that exists outside of a defmodule or inside a defmodule but outside of a def/defp/etc gets interpreted, something like definterprets inside it like it is quote'd (thus 'running'unquote's), but eventually it ends up calling something like :elixir_module:define_functionor something like that, passing in the environment and the AST, and that adds the function definition to the module in an ETS table, and finally whendefmodulefinishes executing it's body then it takes everything that was added to it's module ETS table and passes that to:elixir_compiler.something` or something like that to perform the generation to erlang (core?).

1 Like

The compilation process became more clear.

I understand now why compilation in Elixir is slow: it is because there is an interpretation step.

How does this process compares to the ones of Rust, C++, D, Racket, etc, for meta-programming? Are they also slow?

Yes, they often very much are, that is why most of their versions are so restricted, to prevent you from doing ‘too much’ and making the compilation even slower.

1 Like

Actually Rust compilation isn’t slowed down by macros, but by LLVM which spends hell lot of time on optimisations, as Rust frontend output highly unoptimised LLVM IR. This is currently being worked on.

C++ compilation speed isn’t slowed down due to macros, which are a lot simpler than Rust ones (and can be implemented totally outside the compiler itself, check out cpp tool on *NIX machines). It is slowed down due to:

  • grammar which isn’t obvious in a lot of places, like vector<vector<int>> (this didn’t compiled before C++11 IIRC) or MyClass foo() (which is needed to be written as MyClass foo{} to do what you think it will do)
  • template specialisation, which can create a lot of problems (templates are Turing complete)
  • SFINAE

D has actually one of the fastest compiler on the market, so this is clearly not a case there.

1 Like

It is, however macro’s in Rust are extremely restricted in what they can generate right now so you never see them being slow except in a few potentially worst-cases.

I’m not talking about the pre-processor, that’s more of a read-macro than an in-language macro, I’m referencing templates, which can generate type-aware code at compile-time, these are entirely turing complete (I’m not sure if Rusts even are) and can get very, very slow.

It still followed the spec, just that the precedence of >> was higher than that of the closing template >, that was fixed later.

Just as normal code can, I’ve lived and breathed templates for over 20 years at this point. ^.^

The lifeblood that makes templates so much nicer to use!

D’s macro’s are fairly similar to C++'s, just with a better design that is designed to be interpreted (C++ templates are not interpreted, the fact they can calculate is an oddity of the C++ type system and thus is why it is so slow). They are also restricted in what they can generate.

The prime example is missed though, Lisp, it’s macro’s are both fast and all encompassing, able to generate anything as they are very normal code.

1 Like