Math problem. How to code `Xn+1 = Xn + Xn * S`

al2o3cr · June 17, 2022, 8:59pm

The BEAM doesn’t pass everything via the stack, it has registers as well. The BEAM book has a good introduction to the situation.

You can go down this rabbithole further with compiler options - add @compile :S inside the Series module definition and then run elixirc on the file. You’ll get an error message:

== Compilation error in file foo2.ex ==
** (CompileError) foo2.ex: could not compile module Series. We expected the compiler to return a .beam binary but got something else. This usually happens because ERL_COMPILER_OPTIONS or @compile was set to change the compilation outcome in a way that is incompatible with Elixir

but you’ll ALSO get a file with .S on the end of the name with a BEAM assembly dump in it!

Here’s what Series.generate compiled to with Elixir 1.13 / OTP 24:

{function, generate, 3, 12}.
  {label,11}.
    {line,[{location,"foo2.ex",4}]}.
    {func_info,{atom,'Elixir.Series'},{atom,generate},3}.
  {label,12}.
    {test,is_lt,{f,13},[{integer,0},{x,2}]}.
    {gc_bif,'*',{f,0},3,[{x,0},{x,1}],{x,3}}.
    {gc_bif,'-',{f,0},4,[{x,2},{integer,1}],{x,2}}.
    {allocate,1,4}.
    {move,{x,0},{y,0}}.
    {move,{x,3},{x,0}}.
    {call,3,{f,12}}.
    {'%',{var_info,{x,0},[{type,{t_list,number,nil}}]}}.
    {test_heap,2,1}.
    {put_list,{y,0},{x,0},{x,0}}.
    {deallocate,1}.
    return.
  {label,13}.
    {move,nil,{x,0}}.
    return.

{label, 12} is the main entry point.

It checks the when clause on the third argument (arguments are passed in x registers starting with {x, 0}) and bails out to label 13 for the base case: put [] (Erlang spells it nil) into register {x, 0} (the register used for the return value) and return.

The next two instructions do the actual computation, putting x * s into register {x,3} and count - 1 into {x,2}. Note that the compiler has determined that the original value of count that was passed in via {x,2} is no longer “live” (visible to code) so it can reuse the register to store count - 1.

allocate creates space on the stack for the return address and one saved register - the stack is read via the y registers, again starting with {y,0}.

The next two instructions shuffle registers around: {x,0} is saved in {y,0} (preserving the original value of x) and then x * s is copied from {x,3} into {x,0}.

That lines everything up for a new call to generate: the three needed arguments are in {x,0} through {x,2} (s is unchanged from one iteration to the next).

test_heap2 verifies there’s enough space on the heap for the new cons cell we’re about to construct (and triggers GC if needed).

put_list combines x (from {y,0}) and the return value from generate (in {x,0}) into a new cons cell and puts a pointer to it into {x,0}

deallocate cleans up the two stack slots, and then generate returns.

Every time generate recurses, it grows the stack by two slots (the return address and {y,0}).

Every time it returns from a recursion, it shrinks the stack by two slots and creates a cons cell which takes two slots on the heap.

Overall, the body-recursive function allocates a maximum of 2 * count + 2 slots - first all from the stack until the recursion hits bottom, and then trading stack for heap until completion.

What about the tail-recursive version?

This is a fairly standard body → tail recursion conversion:

  def generate2(x, s, count), do: do_generate2(x, s, count, [])

  defp do_generate2(_, _, 0, acc), do: Enum.reverse(acc)
  defp do_generate2(x, s, count, acc) do
    do_generate2(x * s, s, count - 1, [x | acc])
  end

and the assembly shows the tail-recursion we wanted to see:

{function, generate2, 3, 15}.
  {label,14}.
    {line,[{location,"foo2.ex",7}]}.
    {func_info,{atom,'Elixir.Series'},{atom,generate2},3}.
  {label,15}.
    {move,nil,{x,3}}.
    {call_only,4,{f,9}}.

{function, do_generate2, 4, 9}.
  {label,8}.
    {line,[{location,"foo2.ex",9}]}.
    {func_info,{atom,'Elixir.Series'},{atom,do_generate2},4}.
  {label,9}.
    {'%',{var_info,{x,3},[{type,{t_list,number,nil}}]}}.
    {test,is_eq_exact,{f,10},[{x,2},{integer,0}]}.
    {move,{x,3},{x,0}}.
    {call_ext_only,1,{extfunc,'Elixir.Enum',reverse,1}}.
  {label,10}.
    {line,[{location,"foo2.ex",11}]}.
    {gc_bif,'*',{f,0},4,[{x,0},{x,1}],{x,4}}.
    {gc_bif,'-',{f,0},5,[{x,2},{integer,1}],{x,2}}.
    {test_heap,2,5}.
    {put_list,{x,0},{x,3},{x,3}}.
    {move,{x,4},{x,0}}.
    {call_only,4,{f,9}}.

call_only is the indication that the compiler has successfully detected tail-recursion.

Some other important differences:

no allocate instructions at all, so no stack usage
same amount of heap space checked for as in the body-recursive version (2 slots)

No stack usage, that’s good, right? There’s a tradeoff: in principle, constructing the reversed list at the end doesn’t need to take 2 * count memory (all that “reverse a linked list in-place” drilling is actually useful for once!) but it does take time.