Data type question

Hi all,
I’m fairly new to Elixir, and really loving it. But, I’ve come across something I don’t really understand, and can’t seem to find the right query to Google for and answer.

I’m looking at some code that returns something in the form of ["The answer is" | 42]. What, exactly, is the type of this value? It kinda looks like a list, but List functions tend to fail:

List.last(["The answer is " | 42])
** (FunctionClauseError) no function clause matching in List.last/1
(elixir) lib/list.ex:238: List.last(42)

What am I dealing with here?

Thanks in advance…

1 Like

This kind of structure is known as an ‘improper list’. It means that you have a linked list for which the last element is not []:

[] is an empty list.
[1] is syntactic sugar for [1 | []] (a list, with 1 as head and an empty list as tail.)
[1, 2] is syntactic sugar for [1 | [2 | []]] and so on.

[x | 10] is something that can be written and is syntactically allowed. However, as most of the features/functions in Elixir/Erlang expect lists to always end with an [] (which usually is the base case in a recursive function), there are many functions that will error when given an improper list.

Improper list do not have many uses; I myself have previously asked about why this construct actually exists in Erlang, but I have not found a satisfying answer so far (Most people expect it was ‘just something that got carried over’ from Lisp; it is harder to restrict the syntax for lists than it is to simply allow this weird, not very useful construct).

2 Likes

More specifically the improper list is being used as an iolist. Look at the IO module formore information. It’s a way of concatenating binary data together so you don’t have to reallocate and copy.

1 Like

Ah! I did not know that.

Looking up in the source, you can indeed find some examples here (io.ex, line 467 and on)

Why improper lists are used here over proper lists is a mystery to me, though.
IO.iodata_to_binary([1,2, | <<3,4>>]) and IO.iodata_to_binary([1,2 | [<<3,4>>]) both return the same thing,
and what IO does, of course, is simply using lists ‘backwards’ by appending at the tail every time instead of at the head (Which means that to get the first element, you need to travel the recursion all the way down, taking O(n) but accessing the last element is O(1)).

Why are improper lists useful here?

1 Like

Thank you all for responding to this question. Extremely helpful.

2 Likes

I’d imagine that it gives better performance when gradually composing an iolist for output, than appending to a proper list…

With regular lists one might sometimes prepend items and eventually reverse it, but due to the nature of an iolist it would be very hard to do the reversal (sublists, binaries, etc).

1 Like

Exactly. The main advantage of an improper list is that it is really easy to build. So when you have lot of charlists to concatenate (or lots of binaries, or list of binaries) you can avoid concatenating. You just build an improper list (which is O(1) ). And when you will need to write it to a socket/file, you can just walk it.

This excellent blopost and the part 2 explain it better than i do.

4 Likes

Interesting!

So it seems that:

  • File.write/3, even with the :raw option, doesn’t (as of OTP 19.1.2 at least) optimize writes of iolists, as the underlying file:write_file/3 Erlang function had a bug…

  • Thus, to be on the safe side for a while yet, one should use:

       {:ok, file} = :file.open("some_file", [:write, :raw])
      :file.write(file, my_iolist)
    

Even when using an OTP release w/ the optimization in place, the above should be slightly faster as file:write_file/3 calls iolist_size/1 on the iolist before writing it, mainly for backwards compatibility of error messages it seems

Also of note, is the fact that only binaries are allowed as the “tail” of an improper list, eg: [my_iolist | "foo bar"]; a quick reading of iolist_size/1 seems to indicate anything else will throw an error:

Don’t worry about this unless you have a very performance sensitive use case. After all, it is already fixed on the latest Erlang. :slight_smile:

Heh… most of my work is developing ETL flows, where each of a couple of dozen jobs have “mini batches” of sometimes millions of records flowing through every time they run, so I guess I’m a bit sensitive to I/O performance :wink:

For low-to-medium throughput use cases, probably nothing much to worry about, no!

EDIT: Yes, I do sometimes wish some of it was in Elixir, but alas, it’s not… it would be easier to structure / re-use common patterns etc, if nothing else…

For such cases, nifsy may also be helpful. But it is advised to run it only on Erlang 19 with dirty schedulers enabled. :slight_smile:

1 Like

Hmm… it WOULD be interesting to try nifsy + GenStage.Flow + bulk loading final result into Vertica using the command line tools :slight_smile: Should maybe compare that to what we use now, for some reasonably complex case (where GenStage.Flow could shine, I’m sure).

In part one, improper lists are not mentioned.

In part two, it is says:

Many functions that expect lists will break if given an improper one, so it’s not a good idea to use them, generally speaking. But in this case, because the only thing we’ll be doing with these lists is wrapping them in other lists and ultimately writing them to a socket, making them improper removes the need to allocate quite so many lists.

So, the only reason to use [“a” | “b”] is to remove the need for this single extra pointer to [] at the end, that [“a”, “b”] would have?

what if you have a full list? Iolist accept improper list because when you concat them, you can do it in O(1) without dealing with traversal.

ie list1 ++ list2 vs [list1 | list2]`

@DianaOlympos I mean that you can combine normal lists in the same way as improper lists, i.e. by using cons (|) instead of concat (++).

Yes but then the list is improper.

If I have two IOlists (which might be binaries, charlists or themselves lists of IOlists), I can concatenate them as:

[a | b] creating an improper list, or as [a | [b]][a, b], creating a proper list.

So, to rephrase my question: Is writing [a, b] creating so much more of a memory overhead over [a | b] that it is useful to do the latter?

iex(1)> a = ""
""
iex(5)> :erts_debug.flat_size([a|a])
6
iex(6)> :erts_debug.flat_size([a,a])
8

When doing a lot of concatenations it can quickly add up. The cons version also leads to less nesting when dealing with other iolists - [a | iolist] simply prepends to the existing iolist instead of nesting the data as would happen in the [a, iolist] case. The flatter the iolist, the more performant it is when finally writing somewhere.

1 Like

In this simple example it would make very little difference. The question is whether you are appending the b or just making a deeper nested list. For example if your iolist contains [1,[2,3],4] and you want to add 99 to it do you get [[1,[2,3],4],99] (making a proper list as your suggestion) or would you flatten it to [1,[2,3],4,99]?

Flattening would be very expensive. Which is what iolists are all about.

Note, that basically the only thing you should do with an iolist is output it. If you need to work with the list then you should flatten it else otherwise the code starts getting a bit complex. If you are going to flatten it the it is better to build nested and improper and flatten at the end.

5 Likes

For anyone who ends up here later on… even though File.write/3 doesn’t optimize write of iolists (not even in OTP 19.2) File.stream!/3 will optimize properly when used eg. as below:

[["foo", "bar", "baaz"]] |> Enum.into(File.stream!("purgeme.txt"))

NOTE: The double-wrapped list is because Enum.into will process each item in the outer list separately, so that outer list in turn has to be a list of iolists if we want to try this out…

How do I know for sure, though? Well, I took some inspiration from the blog post mentioned above, and started strace before I executed the code above; it attaches to the beam.smp process started by iex, tracing calls to write + writev:

$ sudo strace -f -s 999 -e trace=write,writev -p 7880
strace: Process 7880 attached with 19 threads
...
[pid  8050] writev(12, [{"foo", 3}, {"bar", 3}, {"baaz", 4}], 3) = 10
...

There’s the writev call we’re looking for!

This works because File.stream!/3 returns a %File.Stream{} struct, which has an implementation of the Collectable protocol that uses :file.open/2 in combination with IO.binwrite/2 (a light wrapper over :file.write/2):

https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/file/stream.ex#L37-L67

1 Like