Hi all,
I’m fairly new to Elixir, and really loving it. But, I’ve come across something I don’t really understand, and can’t seem to find the right query to Google for and answer.
I’m looking at some code that returns something in the form of ["The answer is" | 42]. What, exactly, is the type of this value? It kinda looks like a list, but List functions tend to fail:
List.last(["The answer is " | 42])
** (FunctionClauseError) no function clause matching in List.last/1
(elixir) lib/list.ex:238: List.last(42)
This kind of structure is known as an ‘improper list’. It means that you have a linked list for which the last element is not []:
[] is an empty list. [1] is syntactic sugar for [1 | []] (a list, with 1 as head and an empty list as tail.) [1, 2] is syntactic sugar for [1 | [2 | []]] and so on.
[x | 10] is something that can be written and is syntactically allowed. However, as most of the features/functions in Elixir/Erlang expect lists to always end with an [] (which usually is the base case in a recursive function), there are many functions that will error when given an improper list.
Improper list do not have many uses; I myself have previously asked about why this construct actually exists in Erlang, but I have not found a satisfying answer so far (Most people expect it was ‘just something that got carried over’ from Lisp; it is harder to restrict the syntax for lists than it is to simply allow this weird, not very useful construct).
More specifically the improper list is being used as an iolist. Look at the IO module formore information. It’s a way of concatenating binary data together so you don’t have to reallocate and copy.
Why improper lists are used here over proper lists is a mystery to me, though. IO.iodata_to_binary([1,2, | <<3,4>>]) and IO.iodata_to_binary([1,2 | [<<3,4>>]) both return the same thing,
and what IO does, of course, is simply using lists ‘backwards’ by appending at the tail every time instead of at the head (Which means that to get the first element, you need to travel the recursion all the way down, taking O(n) but accessing the last element is O(1)).
I’d imagine that it gives better performance when gradually composing an iolist for output, than appending to a proper list…
With regular lists one might sometimes prepend items and eventually reverse it, but due to the nature of an iolist it would be very hard to do the reversal (sublists, binaries, etc).
Exactly. The main advantage of an improper list is that it is really easy to build. So when you have lot of charlists to concatenate (or lots of binaries, or list of binaries) you can avoid concatenating. You just build an improper list (which is O(1) ). And when you will need to write it to a socket/file, you can just walk it.
This excellent blopost and the part 2 explain it better than i do.
File.write/3, even with the :raw option, doesn’t (as of OTP 19.1.2 at least) optimize writes of iolists, as the underlying file:write_file/3 Erlang function had a bug…
Thus, to be on the safe side for a while yet, one should use:
Even when using an OTP release w/ the optimization in place, the above should be slightly faster as file:write_file/3 calls iolist_size/1 on the iolist before writing it, mainly for backwards compatibility of error messages it seems
Also of note, is the fact that only binaries are allowed as the “tail” of an improper list, eg: [my_iolist | "foo bar"]; a quick reading of iolist_size/1 seems to indicate anything else will throw an error:
Heh… most of my work is developing ETL flows, where each of a couple of dozen jobs have “mini batches” of sometimes millions of records flowing through every time they run, so I guess I’m a bit sensitive to I/O performance
For low-to-medium throughput use cases, probably nothing much to worry about, no!
EDIT: Yes, I do sometimes wish some of it was in Elixir, but alas, it’s not… it would be easier to structure / re-use common patterns etc, if nothing else…
Hmm… it WOULD be interesting to try nifsy + GenStage.Flow + bulk loading final result into Vertica using the command line tools Should maybe compare that to what we use now, for some reasonably complex case (where GenStage.Flow could shine, I’m sure).
Many functions that expect lists will break if given an improper one, so it’s not a good idea to use them, generally speaking. But in this case, because the only thing we’ll be doing with these lists is wrapping them in other lists and ultimately writing them to a socket, making them improper removes the need to allocate quite so many lists.
So, the only reason to use [“a” | “b”] is to remove the need for this single extra pointer to [] at the end, that [“a”, “b”] would have?
When doing a lot of concatenations it can quickly add up. The cons version also leads to less nesting when dealing with other iolists - [a | iolist] simply prepends to the existing iolist instead of nesting the data as would happen in the [a, iolist] case. The flatter the iolist, the more performant it is when finally writing somewhere.
In this simple example it would make very little difference. The question is whether you are appending the b or just making a deeper nested list. For example if your iolist contains [1,[2,3],4] and you want to add 99 to it do you get [[1,[2,3],4],99] (making a proper list as your suggestion) or would you flatten it to [1,[2,3],4,99]?
Flattening would be very expensive. Which is what iolists are all about.
Note, that basically the only thing you should do with an iolist is output it. If you need to work with the list then you should flatten it else otherwise the code starts getting a bit complex. If you are going to flatten it the it is better to build nested and improper and flatten at the end.
For anyone who ends up here later on… even though File.write/3doesn’t optimize write of iolists (not even in OTP 19.2) File.stream!/3will optimize properly when used eg. as below:
NOTE: The double-wrapped list is because Enum.into will process each item in the outer list separately, so that outer list in turn has to be a list of iolists if we want to try this out…
How do I know for sure, though? Well, I took some inspiration from the blog post mentioned above, and started strace before I executed the code above; it attaches to the beam.smp process started by iex, tracing calls to write + writev:
This works because File.stream!/3 returns a %File.Stream{} struct, which has an implementation of the Collectable protocol that uses :file.open/2 in combination with IO.binwrite/2 (a light wrapper over :file.write/2):