Getting each stage of Elixir's compilation all the way to the BEAM bytecode

sashaafm · October 4, 2016, 1:37pm

Piggy backing a bit on @dvcrn topic BEAM optimization for functions with static return type?, I’ve been trying to understand in a deeper manner how Elixir works internally to generate the BEAM’s bytecode. After reading way too many blog posts I’ve found some things:

People either think that Elixir compiles directly to Erlang source code (.erl)
Or people think that Elixir compiles directly to BEAM bytecode (.beam)

Both of these assumptions seem to be wrong. From the Elixir/Erlang Crash Course from Elixir’s official webpage, we can see that:

Elixir compiles into BEAM byte code (via Erlang Abstract Format).

Steps from Elixir source code to BEAM bytecode

So it’s not directly to Erlang source code, but it’s also not directly to BEAM bytecode. It is first transformed into Erlang Abstract Format (EAF). Continuing further into this topic, I’ve found a couple of blog posts, this one in particular BEAM by Example, where the author tells us the following:

Intermediate representations:

Erlang source code → Abstract Syntax Tree (‘P’) → expanded AST (‘E’) → Core Erlang (‘to_core’) → BEAM byte-code

So Elixir is first transforming to this Abstract Syntax Tree or Expanded AST intermediate representations. It should be something like this:

Elixir → Erlang Abstract Format → Core Erlang → BEAM bytecode

Note

I’ve also seen one or two posts online talking about Elixir being transformed into Erlang Forms. I’ve got no idea if these “Erlang Forms” are the same as one of the steps above or if they are an entirely different thing.

Now we’ve got a few different cases:

Elixir → EAF

This can be achieved through the :elixir Erlang module, that can be found here, like so:

expr = Macro.to_string(quote do: 1 + 2)
env  = :elixir.env_for_all([])
eaf  = :elixir.quoted_to_erl(expr, env)
# => Erlang Abstract Format of the quoted expression

EAF → Core Erlang
and
Core Erlang → BEAM bytecode

I haven’t found a way to achieve these two steps. The further I’ve got is that Erlang’s compiling function can be used to get the various formats:

c(<file_name>, <format>)
c("file.erl", 'P')
c("file.erl", 'E')
c("file.erl", to_core)
c("file.erl", to

*BEAM Bytecode → Disassemble

This can be done either by c("file.erl, 'S'). or :beam_disasm.file/1, which I believe are the same thing, as far as I could find.

Example Gist

I’ve built this small Gist to better show the steps from an Erlang source code all the way to the disassembled bytecode.

Note

James Fish also spoke to me on Slack and told me to check out the :compile.forms/1 Erlang function. I don’t fully understand what this function actually does or returns. It seems to receive Erlang Abstract Format as an argument.

Erlang docs are sparse and usually scattered all around. I’ve only managed to gather some info about this topic from several sources, but I’d like to better understand this process of Elixir → BEAM. I’ve watched dozens of Elixir talks, but I don’t recall ever seeing this explained.

I’m hoping someone around here has some further knowledge on this

ibgib · October 4, 2016, 1:48pm

In case you haven’t seen it, here is another resource I think would interest you on this topic: “Implementing Languages on the BEAM” with @rvirding

I’m only half-way through so far, but it talks about the intermediate steps of any language (not just Elixir) that runs on top of the BEAM(!).

OvermindDL1 · October 4, 2016, 2:10pm

I used to parse the erlang binaries a lot in the past, recently started something with it in Elixir as a typed experiment (ran out of time, bleh), which you can check out here if you want to see how to read type information and such (and show the general API): https://github.com/OvermindDL1/typed_elixir

sashaafm · October 4, 2016, 2:17pm

That looks really interesting and could probably help my research @ibgib! It’s really long so I’ll probably watch it in chunks of 20 min

rvirding · October 4, 2016, 3:23pm

So a quick answer here (more later) is the erlang compiler has 2 main entry points: :compile.file which compiles a text file; and :compile.forms which takes a list of pre-parsed forms. As you have seen you can specify how “far” in the compilation you want to go, whether to pre-expanded macros and parse transforms, core erlang, kernel erlang or just the BEAM instructions (without generating a .beam file). Try doing to_kernel, dkern and dlife for some more fun.

You can also specify what type the input should be, a little anyway, so for example the option :from_core means that the input, whether file of forms, is core erlang. This is what I use in the LFE compiler where I generated core erlang forms (there AST anyway) which I then compile with :compile.forms(forms, [:from_core|options]). I found this easier than generating erlang AST.

Almost all optimisation in the compiler is done on core erlang so I don’t “lose” anything by entering there.

The c("file.erl", 'S') compiles the file only to the BEAM instructions and prints them to a .S file while :beam_diasm:file/1 looks at the beam file and disassembles it. There is also a way which I can’t remember now where you can disassemble the actual code installed in the BEAM itself. This will be slightly different from what the other two give you as there is quite a lot of optimisation done at load time.

Most of the time you don’t need to know this except for the expansion of macros and parse transforms, but it is fun. About 12 min into my talk mentioned there is a slide on the passes of the compiler. It is a bit hard to see. IIRC there is another talk I gave on about the same thing where it is easier to see the slides.

sashaafm · October 4, 2016, 9:19pm

Thank you for the reply Meanwhile, could you please confirm if this image is a correct representation of Erlang’s and Elixir’s intermediate forms from source code to bytecode?

EDIT: I see from the video that there should be Kernel Erlang before the BEAM Bytecode?
EDIT2: I seem to have found the slides from the other talk you mentioned: Slides about implementing Erlang languages

rvirding · October 4, 2016, 10:26pm

This is how I interpret it anyway. I haven’t worked with the Elixir compiler so the person you really need to ask is @josevalim.

There are actually 2 passes between core and bytecode: kernel and life. The kernel pass converts it to kernel erlang where the code has been flattened, lambda lifted and the pattern matching has been compiled. The life pass does life time analysis of variables.

josevalim · October 4, 2016, 10:44pm

Yes. To be more precise, instead of “Elixir”, you could have: Elixir Source Code -> Elixir Macro Expansion -> Erlang Abstract Format -> …