Optimized Tail Recursion

Hello guys, I’ve been wondering lately how do I actually know my recursive functions are in fact optimized and every recursive call is not generating a new stack. Is the arity of the function enough in order to know that a new stack has not been generated?

Thanks in advance.

You know this by writing your functions in a way that is optimisable.

The much bigger problem for some seems to see, which function is actually called last…

Though I’m missing words to explain right now.

1 Like

AppSignal’s article is explaining it quite okay.

1 Like

The tl;dr is: A function is tail recursive if the recursive call is the last thing the function does. If it first calls itself recursively and after that still does something else with the result, then it is not tail recursive and cannot be optimized away.

Also interesting is Tail Recursion is not a silver bullet which talks about Elixir/Erlang’s memory model (stack and heap growing towards each-other) which for many situation makes tail-recursion slightly less important than in some other languages.
Situations in which it remains very important are:

  • When reducing streams (whose length you don’t know; they might very well be more than can fit in memory).
  • When writing handler functions for processes. (if these are not tail-recursive, then the process will start hogging up more and more memory as time goes on). Note that when using GenServer-based processes, this is already managed for you behind the scenes.
3 Likes

One of the simplest pitfalls that someone might intuitively write and end up breaking TCO is to include an operator outside of the final recursive call, but still on the final line, such as:

def recursive(0), do: 0

def recursive(n) when n > 1, do: 1 + recursive(n - 1) 

I’ve done it myself and see it in others’ code I’ve reviewed.

As a fun aside, if you’re on OSX, you can use time -l ... when running any shell command to observe small details about a limited set syscalls as well as memory allocations during the execution, and you can see that the above style uses more and more memory the larger argument you give it, but a properly TCO’d function will grow memory usage by argument size much more slowly, but still non-zero.


It doesn’t translate perfectly because our runtimes have such different semantics, but Haskell’s wiki article about their versions of fold-from-left and fold-from-right has some interesting insights here too, and I think the “visualization” they offer by expanding the math functions captures what I’m trying to express above pretty well:

https://wiki.haskell.org/Foldr_Foldl_Foldl'

foldr (+) 0 [1..1000000] -->
1 + (foldr (+) 0 [2..1000000]) -->
1 + (2 + (foldr (+) 0 [3..1000000])) -->
1 + (2 + (3 + (foldr (+) 0 [4..1000000]))) -->
1 + (2 + (3 + (4 + (foldr (+) 0 [5..1000000])))) -->
-- ...
-- ...  My stack overflows when there's a chain of around 500000 (+)'s !!!
-- ...  But the following would happen if you got a large enough stack:
-- ...
1 + (2 + (3 + (4 + (... + (999999 + (foldr (+) 0 [1000000]))...)))) -->
1 + (2 + (3 + (4 + (... + (999999 + (1000000 + ((foldr (+) 0 []))))...)))) -->

1 + (2 + (3 + (4 + (... + (999999 + (1000000 + 0))...)))) -->
1 + (2 + (3 + (4 + (... + (999999 + 1000000)...)))) -->
1 + (2 + (3 + (4 + (... + 1999999 ...)))) -->

1 + (2 + (3 + (4 + 500000499990))) -->
1 + (2 + (3 + 500000499994)) -->
1 + (2 + 500000499997) -->
1 + 500000499999 -->
500000500000
3 Likes

Just to point out that this optimisation not just done for tail-recursive calls but it is done for every tail-call. It is a TCO.

6 Likes

As others have mentioned, often it doesn’t matter if TCO is applied or not. When it does, staring at the code and convincing oneself should usually suffice. That said, you can obtain stronger guarantees by taking a look at Erlang’s assembly. Here’s a small demo I wrote up a few years ago. It seems to still work on the latest Elixir/Erlang.