Hmm… Now I am not sure any more if any of the conclusions of Erlang’s Tail Recursion is Not a Silver Bullet are actually true.
I haven’t been able to find patch notes for Erlang 12B so far that support or reject this claim. (the section on the Erlang Myths page is not very clear; It is possible that it refers to a different kind of optimization.)
EDIT: I did find this interesting Erlang forum thread, which pointed me to the List Handling part of the Erlang doc’s efficiency section, which, however, states more or less the same as what is included in the Myths section, albeit with a clearer example.
The last post in that forum thread states:
Consider
map(F, [X|Xs]) -> [F(X) | map(F, Xs)];
map(_, [] ) -> [].
By the classical definition, this is not tail recursive.
The Prolog equivalent, however, is.
The reason the Prolog version is tail recursive is that it
can build a data structure with a hole in it [F(X) | Hole]
and pass the hole:
map(F, [X|Xs], &Result) =>
*Result := [F(X) | Hole],
map(F, Xs, &Hole);
map(F, [], &Result) =>
*Result := [].
And this would be safe even in Erlang because nobody (other
than the garbage collector) can ever see the hole.
This extended version of “tail recursion” applies when there
the last expression that the call occurs in is a data structure
construction.
The existing Erlang system doesn’t support extended tail
recursion, largely for the sake of the garbage collector.
But it could, without any change to the source language
or its semantics.
So this seems to refute the idea that the optimization made in R12B is tail-recursion modulo cons.
However, a few posts before, it is stated:
Body recursion:
For every message, build it on the heap, push the message
handle on the stack (1 word), call yourself meaning push the
return address (1 word). If we run out of stack + heap that are
in the same memory block (in the current implementation) do a
garbage collect that will increase the free space.
When the loop is ended we pop one stack frame at the time
(1+1 words) and build one list cons cell (2 words). The result
is built while we return and requires no garbage collections
since the pops from the stack match what’s built on the heap.
Tail recursion:
For every message, build it on the heap, build one list cons
cell (2 words) on the heap and loop to yourself. Garbage
collect just like above. Practically equivalent to the
body recursive variant but build all on the heap and
nothing on the stack.
When the loop is ended, do the lists:reverse/1 to build the
result. A new list is built on the heap requiring the same
size as the old list, and the reversed list becomes garbage.
So, tail recursion should require a top memory consumption of
2 * size(MsgList) words more than body recursion. This will
cause extra garbage collection. Also the lists:reverse/1 is
extra work that the body recursive variant avoids.
In this case I speculate body recursion wins.
In some other implementation where heap and stack are not in
the same memory block the situation changes.
So, because the heap and the stack are in a shared memory space, it seems that the difference is not that huge, and rather in favour of the body-recursive variant.
I am thorougly confused by what all these different sources claim.