Atoms in Erlang AST are tagged with line number 0

tmbb · October 9, 2020, 8:32pm

I’ll use Elixir syntax, even to refer to Erlang AST (the same as Erlang abstract code, which is the term used in the docs). Atoms in the Erlang AST are represented as {:atom, line_nr, atom_name}, where line_nr is the line number in which the atom occurs. When a module is compiled by Elixir, the AST in the BEAM files has all line number set to zero for atoms (ad other data types such as floats and integer, for example). Concretely, all atoms are represented as {:atom, 0, atom_name}. This is suboptimal, because it loses valuable information when you inspect the AST. I think the AST should preserve line numbers (I don’t know if this brings a big change into Elixir’s compiler)

Motivation: why do I care about this? Well, I’m writing a mutation testing framework for Elixir which mutates the Erlang AST of the compiled module (I have very good reasons to do this and I can expand on it later), and it’s very important that I report to the user accurate line numbers when I mutate something, so that the user can see which part of teh code has been mutated. Currently, most literal mutations instead of being tagged with the correct line numbers are tagged with line number 0 (zero).

ar7max · October 10, 2020, 10:32am

Concretely, all atoms are represented as {:atom, 0, atom_name} .

Any tips on how can I check this by myself?

Qqwy · October 11, 2020, 6:41am

The main question would be: Does the Erlang compiler know/care about line numbers (of atoms)? If not, it would be difficult to add this to the Elixir compiler, as it dispatches to the BEAM’s builtin Erlang compiler to compile Erlang code.

josevalim · October 12, 2020, 7:31am

This is a consequence of the fact atoms are represented by themselves in the Elixir AST, which means they don’t have a slot for metadata field. So this can’t be fixed, unfortunately. Same for integers, floats, and strings.

tmbb · October 12, 2020, 8:14pm

Ok, I guess I’ll try to use some heuristics to find the real line number to try to get better reporting.

rvirding · October 12, 2020, 10:29pm

For code generation the Erlang compiler doesn’t care about line numbers. The only time they have some meaning is for reporting errors which is when the line number is used.

Qqwy · October 12, 2020, 11:01pm

One possibilitiy might be to:

take the original Elixir AST
traverse it top-down, passing the meta-field from each parent to the child.
- every time a ‘leaf’ (i.e. literal) is encountered that does not have its own meta field, replace it by a “placeholder” node that does contain the desired metadata (as well as internally still the original literal), which has a syntax that is distinguishable from proper AST nodes.
Use this “embellished AST” as original truth to create your mutated versions off of.
- For every mutated embellished AST:
- Before compiling to execute, traverse it once more to revert the placeholder nodes (to make sure the AST becomes proper compileable Elixir again)
- Now when there is a problem, you still have the information about line numbers and other metadata related to the thing you muated in your “mutated embellished AST”, so you can refer to them.

To accept all Elixir code you’ll need a placeholder that itself is not proper AST, like an non-quoted struct.
This does mean that for your top-down traversal you probably won’t be able to use Macro.prewalk because it will break when encountering the placeholder struct. However, you can easily create your own variant of that function which has a special case for the placeholder struct.

tmbb · October 12, 2020, 11:35pm

Your suggestions only work if I’m recompiling the code after each mutation. In fact, I’m doing something very different. I’m compiling the Elixir code into BEAM files (without any kind of mutation). Then, I extract the Erlang code from the BEAM files and mutate it only once. An example of a Mutation is the following (and now I’ll be using Erlang syntax, because I’m mutating Erlang code):

an_atom

becomes:

'Elixir.Darwin.Mutators.Default.AtomMutator':darwin_was_here('Elixir.MyModule',
                                                              0, an_atom).

The pair {'Elixir.MyModule', 0} is a codon (in molecular biology, a codon is something that encodes an aminacid in the DNA; basically it’s segment of DNA which you can mutate if you want your cell to do something different). The idea is that the function ...:darwin_was_here/3 looks up what the current mutation is (a mutation is a codon plus a mutation number) and either returns the atom as it is or it returns a mutated version of the atom.

That said:

every time a ‘leaf’ (i.e. literal) is encountered that does not have its own meta field, replace it by a “placeholder” node that does contain the desired metadata (as well as internally still the original literal), which has a syntax that is distinguishable from proper AST nodes.

I can only do this if I’m working with the fully macro-expanded elixir code. It doesn’t seem to be possible to access the fully macro-expanded elixir code for a module. I don’t even know if the concept of “fully macro expanded” elixir exists. Elixir compilation is a very imperative process, in which the final result of a module compilation is something that creates a module instead of being something that’s merely compiled (I’m simplifying a little). This is why I’ve decided to mutate the (simpler) erlang abstract code after all macros have been expanded and everything is much more referentially transparent and follows a simpler algebra.

However, this argument works more or less if I decide to traverse the Erlang AST in such a way, I guess.

Qqwy · October 12, 2020, 11:43pm

The closest you can get is by calling Macro.prewalk(ast, Macro.expand(&1, __ENV__) which will perform “full” expansion of at least all non-special forms macros in a top-down fashion just as happens during normal compilation. Whether that is enough for your purposes I do not know.