IO lists with a comma or a bar

I’ve been reading a blog post [0] about IO lists and noticed that the authors build them with commas

names = ["A", "B"]    # => ["A", "B"]
names = [names, "C"]  # => [["A", "B"], "C"]

And I usually see | operator used to build lists. And since the lists built with a comma and the lists built with a bar are not the same to the VM,

iex(14)> names = ["A", "B"]
["A", "B"]
iex(15)> [names | "C"] == [names, "C"]
false

I wonder what the difference between them actually is.

[0] https://www.bignerdranch.com/blog/elixir-and-io-lists-part-1-building-output-efficiently/

The output from IO.puts is the same for both [names | "C"] and [names, "C"] though.

Given your example

names = ["A", "B"]
  • [names | "C"] # => [["A", "B"] | "C"]. This is an improper list.
  • [names, "C"] # => [["A", "B"], "C"]. This is a list where the first element is a list (["A", "B"]), and the second element is a string ("C").

In order to use | to create a proper list, you will need to have the list variable as the following element.

["C" | names] => ["C", "A", "B"]

Are there any performance implications of using one vs the other?

Depends on your definition of performance. If you are talking strictly speed and memory, I do not think so. With that in mind:

iex(1)> length [1,2,3]
3
iex(2)> length [1,2,3 | 4]
** (ArgumentError) argument error
    :erlang.length([1, 2, 3 | 4])

I do not know how many of the standard library functions actually work with improper lists.

Well, performance regarding IO operations, like writing to a socket, as in the blog post that I’ve linked to …

I’ve made a simple test with benchfella

defmodule IOListBench do
  use Benchfella
  
  bench "proper", [list: get_list()] do
    [list, "C"]
  end

  bench "improper", [list: get_list()] do
    [list | "C"]
  end

  def get_list do
    for _ <- 1..100_000, do: "asd"
  end
end

and the results are the same.

IO operations are perfectly fine with improper lists. Using improper lists for concatenating iodata is slightly more efficient in terms of memory.

iex(1)> ab = ["A", "B"]
["A", "B"]
iex(2)> [ab | "C"]
[["A", "B"] | "C"]
iex(3)> [ab, "C"]
[["A", "B"], "C"]
iex(4)> [ab | ["C"]]
[["A", "B"], "C"]
iex(5)> :erts_debug.flat_size([ab, "C"])
26
iex(6)> :erts_debug.flat_size([ab | "C"])
24

When you do [ab, "C"] the “raw” representation would be [ab | ["C" | []]], whereas for [ab | "C"] that’s all there is - there’s one less list “cell” involved.

2 Likes