Why is [1, 2, 3 | 4] a valid list if lists are linked lists?

spencer.christensen · June 3, 2024, 6:59pm

I did not formally study Computer Science, so please bear with me if I am missing something obvious. I am studying the inner workings of functional programming concepts and wrote this simplistic map function:

defmodule EnumMap do
  def map(proc, [head | []]) do
    proc.(head)
  end

  def map(proc, [head | tail]) do
    [proc.(head)] ++ map(proc, tail)
  end
end

EnumMap.map(fn x -> x + 1 end, [1, 2, 3, 4])

This produced: [2, 3, 4 | 5].

I thought “that’s weird, why is 5 being explicitly called with a list constructor?”

Aha, then I caught my mistake:

defmodule EnumMap do
  def map(proc, [head | []]) do
    [proc.(head)] # I needed to wrap this result in a list
  end

  def map(proc, [head | tail]) do
    [proc.(head)] ++ map(proc, tail)
  end
end

EnumMap.map(fn x -> x + 1 end, [1, 2, 3, 4])

Producing: [2, 3, 4, 5]

Hooray! However, that then made me wonder!

If Elixir lists are linked lists, why is [2, 3, 4 | 5] allowed? Isn’t that explicitly saying “this is a linked list, except for the final element which is just an integer.”? Is there a benefit to having linked lists where the final element is not in fact a list? Are there certain data structures or algorithms that use these sort of odd linked lists?

hubertlepicki · June 3, 2024, 7:03pm

This is allowed but this is actually invalid list.

derek-zhou · June 3, 2024, 7:04pm

Because you have a subtle bug in your code, as you find out later.

[2, 3, 4 | 5] is an improper list. You should avoid it, but there is nothing in the language to forbid it.

If you are interested in improper lists, you can read: Making sense of Elixir (improper) lists

spencer.christensen · June 3, 2024, 9:36pm

Thank you for sharing that article. Super interesting read! Also made me realize my use of lists could be optimized:

My original implementation used appending to create the list:

[proc.(head)] ++ map(proc, tail)

However I could instead use the cons operator to actually form a linked list as it is intended:

[proc.(head) | map(proc, tail)]

It also was redundant in its application of [proc.(head)] and could be simplified for just matching against an empty array.

Better version:

defmodule EnumMap do
  def map(_proc, []), do: []
  def map(proc, [head | tail]) do
    [proc.(head) | map(proc, tail)]
  end
end

EnumMap.map(fn x -> x + 1 end, [1, 2, 3, 4])

D4no0 · June 3, 2024, 10:17pm

You can also refactor that recursion to use tail call optimization.

thiagomajesk · June 3, 2024, 10:21pm

I have read this article countless times. It has the best explanation of improper/io lists I’ve read since I first started working with Elixir. I’m constantly recommending this to other people; it’s a great article!

sabiwara · June 3, 2024, 10:51pm

It is a very good suggestion as an exercise, but for this particular case which is building a list it might not be the best approach, actually :lists.map/2 (used under the hood by Enum.map/2) is implemented with body recursion:

github.com

erlang/otp/blob/OTP-27.0/lib/stdlib/src/lists.erl#L2075-L2084


      
          map(F, List) when is_function(F, 1) ->
              case List of
                  [Hd | Tail] -> [F(Hd) | map_1(F, Tail)];
                  [] -> []
              end.
          
          map_1(F, [Hd | Tail]) ->
              [F(Hd) | map_1(F, Tail)];
          map_1(_F, []) ->
              [].

Also see: Erlang -- The Seven Myths of Erlang Performance.

spencer.christensen · June 3, 2024, 10:51pm

As somebody new to functional programming, can I ask whether this means simply using an accumulator? Or is there some more idiomatic-Elixir method for this? E.g.

defmodule EnumMapTail do
  def map(proc, list) do
    run_map(proc, list, [])
  end
  defp run_map(_proc, [], acc), do: Enum.reverse(acc)
  defp run_map(proc, [head | tail], acc) do
    run_map(proc, tail, [proc.(head) | acc])
  end
end

sabiwara · June 3, 2024, 11:03pm

The cons operator is indeed much more idiomatic in this case, but this is mostly a style preference, there is no performance or behavior implication.
Both will generate the same bytecode when prepending a known number of elements with ++.

sabiwara · June 3, 2024, 11:09pm

Indeed, this is exactly what tail recursion is!
Your code is exactly how you would write it.