The BEAM doesn’t pass everything via the stack, it has registers as well. The BEAM book has a good introduction to the situation.
You can go down this rabbithole further with compiler options - add @compile :S
inside the Series
module definition and then run elixirc
on the file. You’ll get an error message:
== Compilation error in file foo2.ex ==
** (CompileError) foo2.ex: could not compile module Series. We expected the compiler to return a .beam binary but got something else. This usually happens because ERL_COMPILER_OPTIONS or @compile was set to change the compilation outcome in a way that is incompatible with Elixir
but you’ll ALSO get a file with .S
on the end of the name with a BEAM assembly dump in it!
Here’s what Series.generate
compiled to with Elixir 1.13 / OTP 24:
{function, generate, 3, 12}.
{label,11}.
{line,[{location,"foo2.ex",4}]}.
{func_info,{atom,'Elixir.Series'},{atom,generate},3}.
{label,12}.
{test,is_lt,{f,13},[{integer,0},{x,2}]}.
{gc_bif,'*',{f,0},3,[{x,0},{x,1}],{x,3}}.
{gc_bif,'-',{f,0},4,[{x,2},{integer,1}],{x,2}}.
{allocate,1,4}.
{move,{x,0},{y,0}}.
{move,{x,3},{x,0}}.
{call,3,{f,12}}.
{'%',{var_info,{x,0},[{type,{t_list,number,nil}}]}}.
{test_heap,2,1}.
{put_list,{y,0},{x,0},{x,0}}.
{deallocate,1}.
return.
{label,13}.
{move,nil,{x,0}}.
return.
{label, 12}
is the main entry point.
It checks the when
clause on the third argument (arguments are passed in x
registers starting with {x, 0}
) and bails out to label 13 for the base case: put []
(Erlang spells it nil
) into register {x, 0}
(the register used for the return value) and return.
The next two instructions do the actual computation, putting x * s
into register {x,3}
and count - 1
into {x,2}
. Note that the compiler has determined that the original value of count
that was passed in via {x,2}
is no longer “live” (visible to code) so it can reuse the register to store count - 1
.
allocate
creates space on the stack for the return address and one saved register - the stack is read via the y
registers, again starting with {y,0}
.
The next two instructions shuffle registers around: {x,0}
is saved in {y,0}
(preserving the original value of x
) and then x * s
is copied from {x,3}
into {x,0}
.
That lines everything up for a new call to generate
: the three needed arguments are in {x,0}
through {x,2}
(s
is unchanged from one iteration to the next).
test_heap2
verifies there’s enough space on the heap for the new cons cell we’re about to construct (and triggers GC if needed).
put_list
combines x
(from {y,0}
) and the return value from generate
(in {x,0}
) into a new cons cell and puts a pointer to it into {x,0}
deallocate
cleans up the two stack slots, and then generate
returns.
Every time generate
recurses, it grows the stack by two slots (the return address and {y,0}
).
Every time it returns from a recursion, it shrinks the stack by two slots and creates a cons cell which takes two slots on the heap.
Overall, the body-recursive function allocates a maximum of 2 * count + 2
slots - first all from the stack until the recursion hits bottom, and then trading stack for heap until completion.
What about the tail-recursive version?
This is a fairly standard body → tail recursion conversion:
def generate2(x, s, count), do: do_generate2(x, s, count, [])
defp do_generate2(_, _, 0, acc), do: Enum.reverse(acc)
defp do_generate2(x, s, count, acc) do
do_generate2(x * s, s, count - 1, [x | acc])
end
and the assembly shows the tail-recursion we wanted to see:
{function, generate2, 3, 15}.
{label,14}.
{line,[{location,"foo2.ex",7}]}.
{func_info,{atom,'Elixir.Series'},{atom,generate2},3}.
{label,15}.
{move,nil,{x,3}}.
{call_only,4,{f,9}}.
{function, do_generate2, 4, 9}.
{label,8}.
{line,[{location,"foo2.ex",9}]}.
{func_info,{atom,'Elixir.Series'},{atom,do_generate2},4}.
{label,9}.
{'%',{var_info,{x,3},[{type,{t_list,number,nil}}]}}.
{test,is_eq_exact,{f,10},[{x,2},{integer,0}]}.
{move,{x,3},{x,0}}.
{call_ext_only,1,{extfunc,'Elixir.Enum',reverse,1}}.
{label,10}.
{line,[{location,"foo2.ex",11}]}.
{gc_bif,'*',{f,0},4,[{x,0},{x,1}],{x,4}}.
{gc_bif,'-',{f,0},5,[{x,2},{integer,1}],{x,2}}.
{test_heap,2,5}.
{put_list,{x,0},{x,3},{x,3}}.
{move,{x,4},{x,0}}.
{call_only,4,{f,9}}.
call_only
is the indication that the compiler has successfully detected tail-recursion.
Some other important differences:
- no
allocate
instructions at all, so no stack usage - same amount of heap space checked for as in the body-recursive version (2 slots)
No stack usage, that’s good, right? There’s a tradeoff: in principle, constructing the reversed list at the end doesn’t need to take 2 * count
memory (all that “reverse a linked list in-place” drilling is actually useful for once!) but it does take time.