Weird SASL lists and can I rely on Kernel.is_list?

I’m implementing a custom log formatter and I wanted to parse the lists returned by SASL except they seem to be weird:

iex(1)> msg = ["Application ", "logger", " started at " | ":nonode@nohost"] ["Application ", "logger", " started at " | ":nonode@nohost"] iex(2)> is_list msg true iex(3)> is_binary msg false iex(4)> Enum.map(msg, &IO.inspect/1) "Application " "logger" " started at " ** (FunctionClauseError) no function clause matching in Enum."-map/2-lists^map/1-0-"/2 (elixir) lib/enum.ex:1088: Enum."-map/2-lists^map/1-0-"(&IO.inspect/1, ":nonode@nohost") (elixir) lib/enum.ex:1088: Enum."-map/2-lists^map/1-0-"/2 (elixir) lib/enum.ex:1088: Enum."-map/2-lists^map/1-0-"/2 iex(4)> List.flatten(msg) ** (FunctionClauseError) no function clause matching in :lists.do_flatten/2 (stdlib) lists.erl:626: :lists.do_flatten(":nonode@nohost", []) (stdlib) lists.erl:629: :lists.do_flatten/2 (stdlib) lists.erl:629: :lists.do_flatten/2 iex(5)> for x <- msg do ...(5)> IO.inspect x ...(5)> end "Application " "logger" " started at " ** (FunctionClauseError) no function clause matching in Enum."-reduce/3-lists^foldl/2-0-"/3 (elixir) lib/enum.ex:1473: Enum."-reduce/3-lists^foldl/2-0-"(#Function<12.54118792/2 in :erl_eval.expr/5>, [" started at ", "logger", "Application "], ":nonode@nohost")

I notice the following: The “list” returned here is not of the shape [ :item1 | [ :item2 | [] ]] in that its final element doesn’t seem to be a []. My personal guess would be that’s why functions from the Enum and List modules seem to fail on it.

But: is_list still returns true even though this is for most purposes a malformed list (in that it cannot be used in most List and Enum module applications).

Is this how things should be? How to best spot such a list and deal with it?

1 Like

What worked for me in the end:

  def cleanUpMessage([ head | tail ]) do
    [ removeNewlines(head) | cleanUpMessage(tail) ]
  end
  def cleanUpMessage([]), do: []
  def cleanUpMessage(any), do: removeNewlines(any)


  defp removeNewlines(element) when is_binary(element), do: String.replace(element, "\n", " | ")
  defp removeNewlines(element), do: element

What I originally tried to do:

[code]# def cleanUpMessage(msg) when is_list(msg) and not is_binary(msg), do: Enum.map(msg, &removeNewlines/1)

def cleanUpMessage(msg) when is_binary(msg), do: removeNewlines(msg)

def cleanUpMessage(msg), do: msg

[/code]

1 Like

is_list/1 does not check the full list, but only the structure of the beginning. Consider it beeing implemented like this:

def is_list([]), do: true
def is_list([_|_], do: true
def is_lsit(_), do: false

This is like this due to performace reasons and relying on the assumption, that one does not produce improper lists, since there is simply no reason to do so…

There is no way to distinguish proper from improper lists in constant time, but is_list/1 (as well as all other typecheckers) do run in O(1) and that is a good thing.

If you really have to work with improper lists, you might need to find a way to work with them in a different way, not using Enum or List module, since they assume to have proper lists again.

1 Like

These are improper lists.

2 Likes

There are reasons to use improper lists. The most common example, in Erlang/Elixir is iolists. These are desirable for various performance reasons.

1 Like

James, what reasons are there for using improper lists in iolists?
From your link, I read “Be careful not to create an improper list.” about half way down.

1 Like

It’s basically compensating for the fact that Erlang’s primary data structure, the linked list, sucks for this job. Unless you cheat!

We want to build up some output, which can be character lists or binaries. So we need to put all of that in something. We want to add to it dynamically, so a tuple is out. Appending to binaries is too much memory copying. It’s gotta be a list, but:

iex(1)> letters = ~w[a b c d e f]
["a", "b", "c", "d", "e", "f"]
iex(2)> Enum.reduce(letters, [""], fn letter, output -> output ++ [letter] end)
["", "a", "b", "c", "d", "e", "f"]

++ is O(n) every time it’s called, making my stupid example above something like O(n!). Ouch. Of course, lists can be O(1):

iex(3)> Enum.reduce(letters, [""], fn letter, output -> [letter | output] end)
["f", "e", "d", "c", "b", "a", ""]

Only that reverses our output. Adding a call to a reverse() would slow I/O operations.

Good thing the Erlang team is clever! If you don’t care about nesting or proper lists, you can append as an O(1) action!!! Behold:

iex(4)> Enum.reduce(letters, [""], fn letter, output -> [output | letter] end)
[[[[[[[""] | "a"] | "b"] | "c"] | "d"] | "e"] | "f"]

It’s still just a linear (recursive) scan to write it out too!

It’s essentially an abuse of linked lists to switch them from a stack into a queue. Awesome, eh?

13 Likes

Yeah, that’s a weird line, especially since both articles show improper lists in action. There are good reasons to use them. You just need to be aware of what you are doing and that not everything will work with your special list.

2 Likes

I understand that an iolist is a list of lists with irregular nesting so you can have [list, list] and [list | list] but from what I understand an “improper” list is a list where the final cell is not itself a list, i.e. [list | << binary >>] (instead of [list | [<< binary >>]]).
If I’ve understood that correctly, the warning makes sense and iolists don’t in fact use “improper” lists. Perhaps they use lists improperly :slight_smile:

3 Likes

An iolist can be either or both. Functions that expect iolists handle the nesting and improper form just fine:

iex(1)> letters = ~w[a b c d e f]
["a", "b", "c", "d", "e", "f"]
iex(2)> improper = Enum.reduce(letters, [""], fn letter, output -> [output | letter] end)
[[[[[[[""] | "a"] | "b"] | "c"] | "d"] | "e"] | "f"]
iex(3)> :erlang.iolist_to_binary(improper)
"abcdef"
iex(4)> IO.puts improper
abcdef
:ok
4 Likes

It is awesome. :slight_smile: I guess finding out if a list is a “proper list” or an iolist is O(n), though?

2 Likes

Yeah, you would need to walk it checking for an improper tail at each step.

2 Likes

I wonder…

Would it be a good idea to extend the Enumerable protocol to also accept iolist cases? It seems like a minor addition, and I don’t think it would have a performance impact to do so, but maybe it is not considered acceptable?

1 Like

We’re getting out of the area where I feel qualified to answer, but I think it’s probably not a good idea. If I iterate over the list: [[65, 66, 67], [68, 69, 70]], I expect my function to be called twice with a list of three items each time. However, if we treat the data as an iolist, then it’s six different codepoints (nesting is ignored). Enum leaves it to us to decide how to handle nesting and handling iolists would take that flexibility away from us.

6 Likes

Ah, now I understand. :slight_smile: Thank you very much for taking the time to lay it out for me. :slight_smile:

1 Like