@gregvaughn is absolutely right that correctness is the first priority. After that if you’re not satisfied with performance you need to measure, measure, measure.
Benchmarking
Thanks to your question I took the opportunity to try the bmark benchmarking tool. Here’s a quick benchmark test I wrote based on the direction in the project’s README and your suggested expressions.
defmodule StringContenation do
use Bmark
alias IO.ANSI
bmark :binary_concat_operator, runs: 1000 do
ANSI.red <> "red" <> ANSI.reset <> ANSI.green <> "green" <> ANSI.reset
|> IO.puts
end
bmark :erlang_list_to_binary, runs: 1000 do
:erlang.list_to_binary([ANSI.red, "red", ANSI.reset, ANSI.green, "green", ANSI.reset])
|> IO.puts
end
I’m piping to IO.puts
here so we can compare these approaches against IO lists. By default bmark does 20 runs per benchmark block, but I’ve increased them to 1000 to get more statistically significant results.
Next I run mix bmark
. As expected I see a bunch of redgreen
lines printed and colored appropriately.
Results
The benchmark timing data are in the $PROJ_ROOT/results/
directory. To compare the two implementations I run
mix bmark.cmp results/stringcontenation.binary_concat_operator.results \
results/stringcontenation.erlang_list_to_binary.results
And the results!
results/stringcontenation.binary_concat_operator.results: results/stringcontenation.erlang_list_to_binary.results:
35 50
27 37
74 73
…
65 53
49 63
103 65
76.677 → 142.257 (+85.53%) with p < 0.025
t = 2.2981012392583193, 1998 degrees of freedom
In this particular example, the :erlang.list_to_binary
, on average, took 1.8553 times as long as the <>
. Don’t interpret this to mean you should always use <>
. The inputs in these examples are rather small. The README for the bmark project has a bit more detail on how to interpret these results.
Moar data
Now let’s try benchmarking a couple more approaches:
bmark :io_list_all_known_data, runs: 1000 do
[ANSI.red, "red", ANSI.reset, ANSI.green, "green", ANSI.reset]
|> IO.puts
end
bmark :io_list_append_to_end, runs: 1000 do
# Simulates converting an incoming stream of unknown data to
# into an IO list and printing.
[[[[[[ANSI.red], "red"], ANSI.reset], ANSI.green], "green"], ANSI.reset]
|> IO.puts
end
Let’s compare each of these new approaches against the <>
results:
Left: :binary_concat_operator
, Right: :io_list_all_known_data
65.195 → 68.618 (+5.25%) with p < 1
t = 0.48428845990199687, 1998 degrees of freedom
Interestingly <>
is still faster, although we have little confidence about that because the p
-value is as high as it can get!
Left: :binary_concat_operator
, Right: :io_list_append_to_end
65.195 → 125.968 (+93.22%) with p < 0.025
t = 2.1281669445218134, 1998 degrees of freedom
<>
crushes the deeply nested IO list, and we can be pretty confident about that for these particular inputs on my test machine. However, that may not be true for concatenating sequences of strings in general. To check that you’d need to do a lot more benchmarks with different kinds of inputs and ideally on several different kinds of machines.
Exercises for the reader
- Determine how long IO lists have to get before they beat
<>
, if at all.
- Does size of the binaries within the lists or the nesting depth affect the results, and if so how?
- Are any of these expressions reduced at compile-time? We’re benchmarking run-time speeds, so any work done by the compiler to optimize our inputs may explain surprising results.
- Do you suspect Elixir, Erlang, or any part of your cache hierarchy are giving you unrealistic results? How can you tell? Could you somehow reduce the influence of caching?
- How well do your benchmarks correspond to the way your code will be exercised in production? Do you have any data about production response times?