Erlang/Elixir string performance - can this be improved?

I think running an exs is going to be slower than running a compiled release in any case, so this benchmark won’t accurately reflect a real world case of a deployed app. Please correct me if I’m wrong everyone.


.exs files are also compiled, it just automatically happens before it runs. Obviously that means you keep paying that cost, but for a tiny script like this, you’re not gonna notice it. Also measurements are of execution time, not startup.

Building a release does not apply any performance optimizations, it just compiles the code like you would otherwise with MIX_ENV=prod mix compile, AFAIK. There are no mentions in the Distillery docs of releases being faster. It’s not what releases are for.

To clarify, it depends on the .exs file, but for other reasons besides compilation.

Running elixir path/to/exs won’t have any protocol consolidated. Running inside a Mix project with mix run path/to/exs (and also in a release) may be more performant depending on the idioms used. I don’t think it would in this case though.


Yeah, I can confirm that running mix run vs elixir doesn’t appear to make a difference, at least enough that it’s measurable in this simplistic benchmark.

Here’s three runs each of elixir and mix run

➜  app mix run lib/nodep.exs < ../words.txt >/dev/null
➜  app mix run lib/nodep.exs < ../words.txt >/dev/null
➜  app mix run lib/nodep.exs < ../words.txt >/dev/null
➜  app elixir lib/nodep.exs < ../words.txt >/dev/null 
➜  app elixir lib/nodep.exs < ../words.txt >/dev/null
➜  app elixir lib/nodep.exs < ../words.txt >/dev/null

The single outlier at 24s can pretty safely be ignored.

But again, none of this is a proper benchmark. All we’ve shown is that Elixir can be written to perform decently in a text processing script. The original article 2019 ruby 2.6 version took 37s, meaning we’re quite a lot faster than ruby at this point.

For the serial code, it knocks about 6 seconds off for me (1m13 vs 1m19 - 2GHz MacBook) if I use a un-named ETS table, i.e.

table =, [])

and then use the reference directly instead of the name:

|> Enum.each(fn word -> :ets.update_counter(table, word, {2, 1}, {word, 0}) end)


This makes sense since now ETS doesn’t have to be concerned about concurrency at all, and there’s no lookup for the table by name.

Of course, now you can’t take advantage of concurrency :wink:

Hi, a friend of mine sent this over, so I gave it a try:


$ curl\~serpent/words.txt > words.txt

$ md5 words.txt 
MD5 (words.txt) = 640ef9e082ef75ef07f0e9869e9d8ae2

$du -h words.txt
217M	words.txt
$ elixir ./count.ex ./words.txt --sort > out-sorted.txt
Using 4 Workers
Processing: 4.748574s
Load Results: 0.820397s
Order Results: 1.99405s
Print Results: 3.136065s
Total Runtime: 10.710908s
$ elixir ./count.ex ./words.txt > out.txt
Using 4 Workers
Processing: 4.944752s
Load Results: 0.836416s
Print Results: 2.556994s
Total Runtime: 8.343633s
$ wc -l out*
  557745 out-sorted.txt
  557745 out.txt
 1115490 total
$ head -n 5 out-sorted.txt
1033175 0
103404 00
353 000
752 0000
19 00000

Please let me know your thoughts. Thanks again!


Nice! That’s a really cool solution, and probably scales well! It fails the first requirement though, the input should be piped in. Reading from a file is faster than reading from stdin in general (eg replace IO.binstream with and it runs faster) and gives you much more control, like you show in your code.

You can speed up your version quite a bit if you switch to iolists, instead of printing line by line.

I actually ended up writing an article about this script, with my final version single threaded (13 lines of code) running in 13 seconds


Hi @jola, did you try getting rid of the :line option and processing chunk by chunk? What I’ve seen so far (more details here: Streaming lines from an enum of chunks) is that this!("numbers.txt",[],2048) #chunks instead of lines
|> Stream.transform("",fn chunk, acc ->
  [last_line | lines] = 
      acc <> chunk
      |> String.split("\n")
      |> Enum.reverse()

seems to be 2x faster than using the :line option in!. I still have to try it in your code and see if there is any difference.

1 Like

I did do that in the article :slight_smile: and compared performance as well as some words on the downsides

1 Like

thx @jola, do you mean this part?

IO.binstream(:stdio, 102400)
|> Enum.to_list()
|> :binary.list_to_bin()
|> String.split(pattern)

Does it load everything into memory and then splits?

It does :slight_smile: like I said in the article, I didn’t spend too much time messing around with the chunk size, but that number gave decent performance improvement compared to reading line by line with that input on my machine.

Also, switching to reading the file directly would be faster, but the original article used stdin in all examples, so it would be not be a fair comparison.

Take a look at @evadne’s solution for one that optimizes reading over 4 workers if you’re curious how fast it can be reading directly from file.

1 Like

New version is up. It reads from STDIN and somehow manages to beat the file-based approach. :wink:

elixir ./count-stream.ex < words.txt > out.txt
Processing: 4.627264s
Reporting: 0.771289s
Total Runtime: 5.408457s
elixir ./count-stream.ex --sort < words.txt > out.txt
Processing: 4.761361s
Reporting: 1.936912s
Total Runtime: 6.707307s

I love this problem. I tried several things that I just knew you be faster, only to learn the hard way that they were not.

Here’s the faster thing I was able to come up with:

defmodule WordCounter do
  def run(table, pattern, parent) do
    spawn(__MODULE__, :read_and_count, [table, pattern, parent])

  def read_and_count(table, pattern, parent) do
    case IO.binread(:line) do
      :eof ->
        send(parent, {:done, self()})

      line ->
        |> String.split(pattern)
        |> Enum.each(fn word ->
          :ets.update_counter(table, word, {2, 1}, {word, 0})
        read_and_count(table, pattern, parent)

  def wait_on(pid) do
    receive do
      {:done, ^pid} ->

table =, [:public, write_concurrency: true])
pattern = :binary.compile_pattern([" ", "\n"])

Stream.repeatedly(fn ->, pattern, self()) end)
|> Enum.take(System.schedulers_online)
|> Enum.each(fn pid -> WordCounter.wait_on(pid) end)

|> Enum.sort(fn {_, a}, {_, b} -> b < a end)
|> {word, count} ->
  [String.pad_leading(Integer.to_string(count), 8), " ", word, "\n"]
|> IO.binwrite

I did like that I was able to get some decent speed with a still pretty straight forward approach. (It just reads lines in multiple processes.)


What was the runtime of this implementation?

1 Like

It runs in about 8.7 seconds on my laptop. I expect that it’s super dependent on how many cores a machine has.

That’s awesome! I think @jola got hers down to 7-ish seconds in her talk (if I’m remembering correctly)? But it would be neat to throw yours onto a 16-core machine and see how it does :smile:

I have 16 core here, what needs to be tested? ^.^

My “System Report” says I have 8 cores, but the BEAM starts 16 schedulers when I run it.

Another advantage of my approach though is that memory usage should stay pretty reasonable, since it’s still working line-by-line.

1 Like

I ran @JEG2 locally and it came up with 9.5s. Here’s the comparison:

➜ time elixir wp_parallel.exs < ./words.txt > /dev/null
elixir wp_parallel.exs < ./words.txt > /dev/null 27.14s user 1.79s system 302% cpu 9.550 total

➜ time elixir wp_singlel.exs < ./words.txt > /dev/null
elixir wp_single.exs < ./words.txt > /dev/null 10.28s user 2.91s system 103% cpu 12.739 total

You can see that the parallel version uses much more CPU, but it brings the time down by ~20%. It is a cool approach to read in parallel. I tried to read at once and then spawn out the processes, which was slower than I expected.

I guess you have 8 physical cores and 16 virtual?

@darinwilson running with > /dev/null gets it down lower

➜  elixirconf time elixir lib/direction3/async_stream.ex < ../words.txt > /dev/null
elixir lib/direction3/async_stream.ex < ../words.txt > /dev/null  17.79s user 3.08s system 380% cpu 5.484 total

and then you can cheat a bit

➜  elixirconf time elixir --erl "+hms 500000000" lib/direction3/async_stream.ex < ../words.txt > /dev/null 
elixir --erl "+hms 500000000" lib/direction3/async_stream.ex < ../words.txt >  16.26s user 2.82s system 420% cpu 4.543 total

For those keeping score at home, 4.5 is almost as fast as C :zap: (admittedly it uses 4x CPU)

Here’s my execution time for @JEG2’s version

➜  elixirconf time elixir lib/extra/jeg2.exs < ../words.txt > /dev/null
elixir lib/extra/jeg2.exs < ../words.txt > /dev/null  27.02s user 1.77s system 276% cpu 10.417 total

Using the +hms cheat brings it down to like 7 seconds.

I really like this solution. You don’t have to stitch together prefix+suffix and memory usage is about 1GB on my machine. Speed is totally reasonable :+1: