Efficiency of IO lists in elixir/erlang

I have read about the efficiency of IO lists. However I am still not able to wrap my head around it.

How exactly is nesting of IO list more efficient than concatenating strings ?
How exactly is IO list more efficient when there are a lot of repeating words ?

2 Likes

How exactly is nesting of IO list more efficient than concatenating strings ?

Let us say you are concatenating the following strings:

"Hello" + " World!"

Elixir/Erlang will create a new string and copy the bytes from the 1st string and the 2nd string to this new string. So now, we’ve copied 12 bytes. You can easily see how this effects large strings which show up while using stuff like templates. However, when using an iolist. If you want to create a new string which is a concatenation of these 2 strings. You would just create a single list with 2 elements pointing to these strings. Say you have two strings each 1000 bytes long, you would still need just a 2 element list.

How exactly is IO list more efficient when there are a lot of repeating words ?

If you have repeating words you would just create a list referring to the same word multiple times without duplicating memory.

3 Likes

Hi,

This talk covered most of my questions about this issue: https://www.youtube.com/watch?v=zZxBL-lV9uA

2 Likes

Thanks, that cleared a lot but if lists are actually linked list then what happens internally when we delete one of the duplicating words (considering they are referring to the same word) ?

2 Likes

Everything is immutable, so you cannot delete anything. You can make a new list with new data, but you cannot delete anything.

3 Likes

The idea is that when you concatenate the two lists a and b: result = a ++ b, this takes O(length(a)+1) steps, because only after traversing all of list A you know how to combine them. This linear amount of time is spent évery time we want to concatenate two strings. Usually it is common to add new parts of strings to the end of the already-existing string, meaning that the length of a will quickly grow.

But if you concatenate IOLists by doing result = [a, b], then this only takes O(2) steps to compute, since only the pointers to the heads of these lists (which the BEAM already has available; this is the only piece of data that is kept for a linked list) need to be read and copied over to a new list.

Yes, when consuming (‘flattening’) an IOList, it will take a little more steps to follow all the nested list-of-list cases to end up with a single linear sequence again, but this only needs to happen ónce, namely when you’re done with building the IOList.

And that’s why using IOLists is faster in many common cases.

(Using binaries instead of charlists moves this problem from normale lists consuming more time than IOLists to concatenated binaries consuming more memory and needing more copying than IOLists where only the pointers to the already-existing data need to be copied, but the example still holds.)

7 Likes