Is the majority of Elixir a kind of API of its core functions?

I’ve been trying to understand Elixir macros and structs.

During my journey I ascertained the extent to which atoms (an ostensibly potent and novel type for language extension) are utilised to the effect of essentially exploding the language out from it’s kernel.

I read that around 91% of Elixir is written in Elixir, 8.1% is written in Erlang and 0.5% is written in other. Am I correct in my understanding that Elixir’s core functions are written in Erlang, and that core functionality was designed to enable the construction of operators, macros and structs by experienced users (core teams) of the language which then allowed the explosion of the language’s size and its continued extensibility?

In other words, every written module/function, operator and variable which, ultimately are tuples of some sort, goes back to the core functions to be processed appropriately?

I’m asking this because if my understand is correct then I am somewhat more confused by macros. Are defmacros also tuples? Where do they go to be processed appropriately, what is meant by the statement “macros write elixir code”, are they used to extend the core functionality in the kernel?

If my writings sound like ramblings it’s because I’m down a rabbit hole.

2 Likes

Well, yes and no. Parser of Elixir is written in Erlang, Kernel.SpecialForms are mostly implemented in Erlang as these are needed to be understood by parser sometimes (like for or when). Also some core functions have temporary “stubs” when loading Kernel module. After that, all other functions and macros are implemented in Elixir. That includes operators (ex. +), “keywords” (ex. def) and other similar constructs.

All elements in the language (and it doesn’t really matter much which language) is parsed to AST. In Elixir AST is represented as tuple in form {ast(), metadata(), [ast(), ...]} | {ast(), ast()} | [ast()] | atom() | number() | binary().

Macros return tuples, which must be an AST and then it will be parsed respectively.

Yes, see mentioned earlier def. Most of the code you write in Elixir is within some kind of macro (as defmodule itself is macro as well).

4 Likes

These numbers are a bit skewed because the Elixir GitHub repository does not only contain Elixir-the-language but also Mix (the project management tool), ExUnit (the testing framework), IEx (the REPL) and a couple of other applications/libraries that ship with Elixir. Still, it definitely is the case that nearly all of the parts of the language you’re interfacing with during normal daily usage of the langage is itself written in Elixir.

The great thing about this approach is that nearly all aspects of Elixir can be customized from within Elixir which allows for the language to grow and adapt gradually (see the marvellous talk ‘Growing a Language’ by Guy Steele as to why this is useful.)

Furthermore, because compiled Elixir code runs on a VM, and because Elixir applications always contain its compiler, we’re able to run arbitrary code (from other module we have already compiled) at compile-time and dynamically (re)compile modules at run-time.
(I gave a 20-minute talk at FOSDEM earlier this year that goes into a bit more detail of this, which you might find interesting.)

6 Likes

Thanks for your response hauleth.

I see, so you’re saying that this…

{:+, [context: Elixir, import: Kernel], [1, 2]}

…isn’t actually going off to an Erlang function, it’s the AST representation of that expression?

When I read about quoting and unquoting

This statement…

Even a map is represented as a call to %{} :

iex> quote do: %{1 => 2}
{:%{}, [], [{1, 2}]}

…gave me the impression that these were being sent to some lower level of the architecture. Is it being sent to the parser like that or does it undergo any transformations before then?

As defmacro is a macro, does Elixir have elixir functions that work directly with the AST representations, or are those just sent to the parser?

Since a macro can’t be required to make a macro and since applying quote do to an atom just outputs the atom value I assume one can make macros directly using AST format. Can one make macros without having to use the defmacro macro?

Wouldn’t that mean that even when the code is written in the AST format it still gets processed before being parsed. Does one use AST format directly to have more control when writing macros? Not that enough control isn’t already provided, I’m just trying to understand the point of us having access to the AST format of these data structures.

Finally one more thing that is confusing me is that when quote do is performed on an atom an atom is returned, when performed on an empty list and empty list is returned, but when performed on a tuple, empty or populated the tuple is treated like a map or an operator?

The tuple is one of the basic building blocks, so why is it’s AST representation more complex than that of the atom and empty list?

1 Like

Thank you for this post, I did not know that elixir could do such things at runtime, that’s very relevant to some outcomes I’ve been trying to achieve in another functional language.

Thanks for the link to the talks.

1 Like

It will call Kernel.+/2 function, which is defined as:

def left + right do
  :erlang.+(left, right)
end

And due to the Kernel compilation flags will be inlined everywhere it is called. So in the resulting code it will be :erlang.+/2 BIF, but “conceptually” it isn’t (and you can also always override it with your implementation).

It will “call” Kernel.SpecialForm.%{}(args), but it is handled by the “interpreter” of Elixir. Code is already parsed, so there is no more parsing steps involved.

It has, but these functions are private and shouldn’t be called directly by the user. Also, as said before, it do not involve parser anymore, as code is already parsed.

Yes, you can use AST directly, however working through quote/1 and unquote/1 is much easier in most cases. However there are situations when working with AST can be easier (for example pattern matching on AST).

Yes.

To remove ambiguity in situations like quote do: {:foo, [], []}. If this would return tuple as is, then the generated AST would be identical to quote do: foo(), which would be untrue. So with treating :{} as an “operator” we remove ambiguity in the AST.

4 Likes

Wow, thanks a lot, this has cleared up a lot.

How do you have such a deep understanding?

I don’t usually feel like diving into the guts of languages but Elixir and Julia seem like they are fun enough to give it a real go!

Years of experience in different languages, parsers, and compilers. Also I have written some macros, so I know some stuff about them.

Don’t worry, I think that with time, the experience will come, and you will also “instinctively” know that stuff as well.

I very much recommend it! This (diving into the guts of Elixir’s source code as well as the guts of libraries that I wanted to understand) is how I learned a lot about how Elixir’s macro-expansion works.

2 Likes

If I recall correctly, the set of things which are identical when quoted are: binary literals, atoms, numbers, two-tuples, and lists.

BTW I cut my teeth on Julia macros. While not the same, they are quite similar, (I find Elixir’s ast is better, though I like how calling a Julia macro is sigilled) so, if you learn one well you will be able to operate fluently on the other without having to unlearn very much.

2 Likes