Implications of String concatenation vs. IOList for ANSI color codes

If you use IO.ANSI.format you get a nested chardata list thing that you can print.

However, you can also do:

IO.puts(ANSI.red <> "red" <> ANSI.reset <> ANSI.green <> "green" <> ANSI.reset)

One can easily see how the above could be converted to a helper function that appends the color code and reset before and after a given string. In fact, I’ve been thinking of making a simple wrapper over this in an application.

However, the fact that @josevalim and co. don’t do this in the standard library has me thinking: is it bad to do the string interpolation/concatenation I did? Would it perform poorly? Should I always stick to the IO lists/chardata thing?

Binary/string concatenation requiring copying. When you do a <> b, you have a being allocated, b being allocated and then a third binary c being allocated with a copy of the operands.

When you do something like [a, b], a is allocated once, b is allocated once, and then you have a list pointing to both a and b without extra copying.

That’s why it is preferable to stick with IO lists. For high performance operations, it is certainly a must. It is, for instance, one of the reasons why Phoenix templates are so efficient. In your example, the list one will even be more concise:

IO.puts([ANSI.red, "red", ANSI.reset, ANSI.green, "green", ANSI.reset])

If you need to build a string, often it is preferred to build a list and then call IO.iodata_to_binary to build the binary just once instead of concatenating along the way.

8 Likes

You’re too fast José! Thought I could get a coffee before digesting your answer. Thanks a lot, I’ll keep this in mind while building my library.

1 Like

José is of course perfectly right, if you are going to output the resultant string leave it as an IO list and let the system do the concatenation. That is why we have IO lists.

There is however one time when you really should concatenate and that is when you want to analyse the string, step over and work out what is in it. While writing code that can step over any IO list is an “interesting” programming exercise [*] you don’t really want to do it in real code. :slight_smile:

[*] An an exercise write the “trivial” function that returns the first n characters from a general IO list. Trivial with a string, not with an IO list.

1 Like

Question then. Code like Macro.camelize https://github.com/elixir-lang/elixir/blob/master/lib/elixir/lib/macro.ex#L1249 does basically a body recursive walk with <> and is very fast. I tried building IOlists and other approaches with it before, and the approach it uses was best.

How is that able to avoid the copying issues?

2 Likes

In this particular case, the compiler performs a series of optimizations both when matching and when constructing the binary that are really worth reading about: Erlang -- Constructing and Matching Binaries

You can check directly in Elixir source by running:

ERL_COMPILER_OPTIONS=bin_opt_info elixir lib/elixir/lib/macro.ex
1 Like