Is there a complete Elixir AST reference?

Lately I’ve been digging into the elixir AST for learning purposes. There is a section in the elixir docs(Syntax reference — Elixir v1.11.4) that describes the ast and gives some examples of it.

It mentions that the ast is made of:

  • atoms - such as :foo
  • integers - such as 42
  • floats - such as 13.1
  • strings - such as "hello"
  • lists - such as [1, 2, 3]
  • tuples with two elements - such as {"hello", :world}
  • tuples with three elements, representing calls or variables

Regarding the last point, it then says:

the first element is an atom (or another tuple), the second element is a list of two-element tuples with metadata (such as line numbers) and the third is a list of arguments.

What I didn’t find so far is what is meant by “or another tuple”. Moreover, when that element is a tuple there seem to be some cases where elixir gives them a special meaning.

For example, consider the “dot syntax”:

quote do
  foo.bar(:baz)
end
{{:., [], [{:foo, [], Elixir}, :bar]}, [], [:baz]}

The first element there is the tuple {:., [], [{:foo, [], Elixir}, :bar]} that represents foo . :bar, so here the dot is a binary operator. If that is used as the first element in another three-tuple ast node, then it turns into foo.bar(), and any arguments will be put inside the parens, in this case foo.bar(:baz).

This does not happen with other operators. For example, this node:

{{:+, [], [{:foo, [], Elixir}, :bar]}, [], []}

Gets turned into foo + :bar(). So this tells me the dot expression is a special case. This in itself is not a surprise, but how many other special cases do exist?

By playing around I found another special case with the multi alias syntax. This expression:

Foo.{Bar, Baz}

is represented by this ast:

{{:., [],
  [
    {:__aliases__, [], [:Foo]},
    :{}
  ]}, [],
 [
   {:__aliases__, [], [:Bar]},
   {:__aliases__, [], [:Baz]}
 ]}

The first element is again the dot operator tuple:

{:., [], [{:__aliases__, [], [:Foo]}, :{}]}

which in this case means Foo . :{}. The difference is that now if that tuple is used as the first element in another three-element tuple node, the arguments are put inside of the curly braces:

Foo.{Bar, Baz}

No matter how bizarre the arguments are:

{{:., [], [ {:__aliases__, [], [:Foo]}, :{} ]}, [], [ {:__aliases__, [], [:Bar]}, :baz, {:+, [], [1, 2]} ]}
#=> Foo.{Baz, :baz, 1 + 2}

So again, the dot operator when the right hand side is the :{} atom has special meaning.

I’m finding all of this by playing around, since the elixir docs don’t cover these cases.

The question is: is there a reference of the Elixir AST, or some section/comments/issue that documents all the cases where ast nodes are given special meaning?

8 Likes

I’m subscribing this topic. It’d be interesting to know if such documentation exists.

4 Likes

The parser to handle alias Foo.{Bar, Baz} is:

build_dot_alias handles actually creating the AST structure.

The code to handle that AST is:

2 Likes

Thanks for the pointer! I was reading both the elixir .yrl file and the Macro.to_string/2 source with the hopes of finding some useful info, and I got some.

It seems there’s only three cases in which a tuple is used as the first element, and all involve the dot operator in some way:

Access syntax
The access syntax is represented as a {:., [], [Access, :get]} call with two arguments. The first is an expression node, and the second is the key used by Access.get:

quote do
  foo[:bar]
end

#=>
{
  {:., [], [Access, :get]},
  [],
  [{:foo, [], Elixir}, :bar]
}

The dot is nowhere to be seen in the syntax, but I guess it’s used in the ast because it somewhat means “it’s a member of”. There’s very explicit checks for the [Access, :get] expression, and the ast node is being explicitly generated that way by the parser.

Dot alias
The dot alias(foo.{:bar, :baz} or Foo.{Bar, Baz}) is represented as a {:., [], [left_hand_side, :{}]} call, where the arguments are the elements inside the curly brackets:

quote do
  foo.{:bar, 1 + 2}
end

#=>
{
  {:., [], [{:foo, [], Elixir}, :{}]},
  [],
  [:bar, {:+, [], [1, 2]}]
}

Note: this example will raise as it’s an invalid expression, but it’s allowed by the ast and it may be possible to use it as a very funky syntax for some ugly macro.
This node is built in this line.

Generic calls
Every other kind of call(except “simple” calls like foo(:bar)) is represented as a {:., [], [before_dot, after_dot?]} call, where the arguments are the elements inside the parenthesis. In that tuple, the before_dot expression is the left hand side of the dot and is always present, but the after_dot? expression is optional, as in the case of foo.(:bar) (this case is covered by the Elixir syntax reference):

quote do
  foo.bar(1, 2, 3)
end
#=> {{:., [], [{:foo, [], Elixir}, :bar]}, [], [1, 2, 3]}

quote do
  foo.(1, 2, 3)
end
#=> {{:., [], [{:foo, [], Elixir}]}, [], [1, 2, 3]}

Those seem to be all the cases, from what I could gather so far. If what I found is correct, I’d gladly submit a PR to extend the elixir docs on this, as it’s pretty much undocumented.

1 Like

Everything, not just calls, except for what’s in that list (atom string float int string list 2-tuple) is a three element tuple. For example, map literals, and non-2-ary tuples are neither calls nor variables but they are also represented as a triple. Iirc the reason why 2-ary tuples are not represented as triples is because it makes parsing options keyword lists passed into a macro a nightmare.

E.g.

  quote do {1, 2, 3} end

According to the docs, three element tuples are “representing calls or variables”, so for instance {:{}, [], [1, 2, 3]} can be thought of as a call to the tuple constructor {}/1(which is a special form, like %{}/1, __MODULE__/0 or with/1).
My interest however was in the cases where the first element in that 3-tuple is another 3-tuple. I was writing a little code style checker like credo as an exercise and stumbled upon the “weird” case of the dot alias ast and thought “what else am I missing?”.

Yes, I remember reading that is the reason but I can’t remember exactly where I read that.

Is there a way to generate an AST that preserves comments? How does the Elixir formatter do it?

Yes, in any situation where you have call on the left hand of () (call) operator. There aren’t many of these, only 2 to be exact:

  1. . operator, like mod.func()
  2. another call, like foo()()

Of course 2nd option is possible in only 1 situation, unquote(atom)(), and will fail to compile in any other case, however it is possible to have such construct in the quote.

quote do: foo()()
# => {{:foo, [], []}, [], []}

To see how it work when unquote/1 is used you need to disable unquoting first:

quote [unquote: false], do: unquote(:foo)()
# => {{:unquote, [], [:foo]}, [], []}
2 Likes

Looking at the code, the formatter does not put comments in an AST.
Rather it seems to pass a callback to the tokenizer to process comments (see here) and then it generates an algebra document from the comments and the AST.

The formatting is then done on the algebra document, not on the AST.

3 Likes

This is correct, the formatter works with an algebra document since quite a lot of data is lost when parsing to the AST
With https://ast.ninja/ you can play around and see both the AST and the albegra document for any given elixir code(click Add and then Code.Formatter.to_algebra/2)

3 Likes

If you’re still interested, I wrote a blog post with everything I learned so far about the AST:

I’ll try and see if I can make a PR to the elixir docs to fill the missing parts in the syntax reference

15 Likes

Was just about to post the link to my AST Ninja but I see somebody already beat me :slight_smile:

3 Likes

It’s a really helpful tool, thanks for it :slight_smile:

1 Like

As I just posted in José announces Livebook - a web application for writing interactive and collaborative code notebooks, I think that Livebook could be a great way to generate this sort of reference documentation (e.g., with executable examples).

-r

I’ve seen a couple people already using Livebook to try stuff or document code from existing codebases, it seems to be great at that. I’m not sure it would be a good idea for public documentation like ex_doc(because of remote code execution), but for internal documentation it’s great.

If making it safe to use for public facing documentation is desired, I guess that Livebook having pluggable “interpreters” for the snippets could enable that. A “safe” interpreter could do some ast whitelisting, something like what Sand or tryelixir do.