Erlang/Elixir string performance - can this be improved?

serpent · March 23, 2019, 11:50pm

Following this small benchmark experiment (https://blog.fefe.de/?ts=a2689de5), I wrote a short Elixir script to calculate the counts of distinct words in a text read from stdin. Other language versions are available (http://ptrace.fefe.de/wp/) as well as instructions to create the input file and some performance results (http://ptrace.fefe.de/wp/timings2019.txt).

#!/usr/bin/env elixir

# setup input stream for reading and tokenisation
IO.stream(:stdio, :line)
  |> Stream.flat_map(&(String.split(&1)))

  # ingest stream, count words, sort result
  |> Enum.reduce(%{}, fn x, acc -> Map.update(acc, x, 1, &(&1 + 1)) end)
  |> Map.to_list
  |> Enum.sort(fn {_, a}, {_, b} -> b < a end)

  # print it beautifully
  |> Enum.each(fn {word, count} ->
    IO.puts(String.pad_leading(Integer.to_string(count), 8) <> " " <> word)
  end)

It performs pretty badly:

% time ./exwp.exs < words.txt > /dev/null
./exwp.exs < words.txt > /dev/null  137.20s user 8.48s system 110% cpu 2:11.49 total

Comparing with the Ruby version from above link:

% time ./wp.rb < words.txt > /dev/null
./wp.rb < words.txt > /dev/null  21.54s user 0.19s system 99% cpu 21.750 total

Sure, this is not a typical use case and BEAM is not exactly famous for its string performance – but it’s fun to play around, right? Now I’m wondering, is it just my script? Is there room for improvement?

kokolegorille · March 23, 2019, 11:52pm

Welcome to the forum,

There is a better implementation

PS: You might have a look at Flow as the example is about counting words in a textfile.

jola · March 24, 2019, 7:44am

Could you share a link to the input?

I tried generating it and

➜  llvm-8.0.0.src find . -type f | xargs cat | tr -sc 'a-zA-Z0-9_' ' ' | ../lines > words.txt
tr: Illegal byte sequence

josevalim · March 24, 2019, 9:22am

To be precise, what is slow in this case is not the string processing, but the fact that Map is a immutable data-structure. For example, using :ets.update_counter/4 should speed up things considerably. The documentation for Flow talks about this too.

serpent · March 24, 2019, 11:02am

For me

% find . -type f | xargs cat | LANG="en_US.ISO-8859-1" tr -sc 'a-zA-Z0-9_' ' ' | ../lines > words.txt

did work. I put it on my server as well (http://teralink.net/~serpent/words.txt).

jola · March 24, 2019, 3:11pm

For reference, your original code took 1:54 total on my machine.

Simply running

IO.stream(:stdio, :line)
|> Stream.flat_map(&(String.split(&1)))
|> Stream.run()

takes around 35s on my machine. So just reading from IO and splitting is already slower than the full ruby implementation.

Switching to IO.binstream speeds it up from 35 to 25s , so that’s one step. The difference is that we don’t have UTF8 support, which I assume most of the implementations don’t bother with anyway.

Next lets take a look at @josevalim’s suggestion of using ETS

# Create the ETS table
:ets.new(:words, [{:write_concurrency, true}, :named_table])

# setup input stream for reading and tokenisation
IO.binstream(:stdio, :line)
|> Stream.flat_map(&String.split(&1))
# Insert and update counters for each word in table
|> Enum.each(fn word -> :ets.update_counter(:words, word, {2, 1}, {word, 0}) end)

# Read all rows from the table
:ets.match_object(:words, {:"$0", :"$1"})
|> Enum.sort(fn {_, a}, {_, b} -> b < a end)
# print it beautifully
|> Enum.each(fn {word, count} ->
  IO.puts(String.pad_leading(Integer.to_string(count), 8) <> " " <> word)
end)

This runs in 42s on my machine, so we’ve made a big improvement from 114s. We can shave another 2s off by replacing the last line with building an iolist.

|> Enum.map(fn {word, count} ->
  [String.pad_leading(Integer.to_string(count), 8), " ", word, "\n"] end)
|> IO.puts

But we’re still not really using the concurrency available to Elixir. @kokolegorille suggested using Flow, which lets us easily run on more than one core. Using almost the exact example posted in the documentation for Flow I ran

IO.binstream(:stdio, :line)
|> Flow.from_enumerable()
|> Flow.flat_map(&String.split(&1))
|> Flow.partition()
|> Flow.reduce(fn -> %{} end, fn word, acc ->
  Map.update(acc, word, 1, & &1 + 1)
end)
|> Enum.sort(fn {_, a}, {_, b} -> b < a end)
|> Enum.map(fn {word, count} ->
  [String.pad_leading(Integer.to_string(count), 8), " ", word, "\n"] end)
|> IO.puts

which finishes in 21s. That means we’ve caught up with ruby. But I dropped ETS, so lets bring it back.

:ets.new(:words, [{:write_concurrency, true}, :named_table, :public])

IO.binstream(:stdio, :line)
|> Flow.from_enumerable()
|> Flow.flat_map(&String.split(&1))
|> Flow.partition()
|> Flow.each(fn word ->
  :ets.update_counter(:words, word, {2, 1}, {word, 0})
end)
|> Flow.run()

:ets.match_object(:words, {:"$0", :"$1"})
|> Enum.sort(fn {_, a}, {_, b} -> b < a end)
|> Enum.map(fn {word, count} ->
  [String.pad_leading(Integer.to_string(count), 8), " ", word, "\n"] end)
|> IO.puts

This doesn’t run any faster than the previous Flow version, still clocking in at 21s (saw some variance, maybe 20-22s, not surprising since I’m using around 7/8 virtual cores and I have lots of apps running). I’m not really sure why it’s not faster, considering we’re avoiding updating a Map. I’m curious if there’s a smarter way of reading the ETS table, maybe even in a sorted fashion?

Finally, the original post claims to allow VMs startup time before measuring, which your examples of using time does not. Timing the code from start to finish gives me an actual time of at least 0.6s less than what time reports. If you want to get more accurate than that you’ll need to use a proper testing harness like https://github.com/bencheeorg/benchee

ps fun note about ETS, without {:write_concurrency, true} the final version takes 120s instead of 21s

pps noticed I forgot an unnecessary .partition in the final version. Removing it yields a run time of ~17s.

dimitarvp · March 24, 2019, 4:18pm

Not sure if we can get sorting but using Erlang’s relatively new :counters module should be even faster than ETS in this case.

josevalim · March 24, 2019, 5:12pm

Great summary @jola!

Note that If you are using ETS, you don’t need partition. So the previous version was also correct. In fact, if you don’t need partition for this example, you likely don’t need flow either and IO.binstream+Task.async_stream will most likely be fine.

For reading ETS, you can also use :ets.tab2list to get the whole table as a list, but I am not sure if it makes a big difference.

counters require an index (integer) as a key, so you would need a way to convert words into integers which may end-up offsetting the benefits you would get from counterss.

joaoevangelista · March 24, 2019, 7:44pm

Forenote: I’m on Windows using Git Bash so my IO is garbage

I tried some of the code, first I used the words.txt file that @serpent provided and since I’m dumb and I don’t know how to feed from cat to elixir I used File.stream!/1 sorry!
But the points that I want to shared that at least occurred on my machine was :

String.split/3 on: |> Stream.flat_map(&String.split(&1, " ")) is faster than String.split/1 for splitting at spaces
Printing the values also adds a lot of overhead

I came to the loose conclusion that iterating and strings are not the weak point but the IO devices are?

Here is the code:

# file: wc.exs


# removes 20ms from creation
:ets.new(:words, [{:write_concurrency, true}, :named_table])

c = fn ->

  File.stream!("words.txt")
  |> Stream.flat_map(&String.split(&1, " ")) # using space decrease from approx half than using split/1
  |> Enum.each(fn word -> :ets.update_counter(:words, word, {2, 1}, {word, 0}) end)
  :ets.match_object(:words, {:"$0", :"$1"})
  |> Enum.sort(fn {_, a}, {_, b} -> b < a end)
  # print it beautifully
  |> Enum.each(fn {word, count} ->
    # print does add overhead from 187 to 2282
    IO.puts(String.pad_leading(Integer.to_string(count), 8) <> " " <> word)
  end)
end

# only time code execution, not vm start
{micro, :ok} = :timer.tc(c)
IO.puts("#{micro/1000}ms")

Run with: $ elixir wc.exs

josevalim · March 24, 2019, 7:52pm

You can do this to verify. Replace this:

|> Enum.each(fn {word, count} ->
  IO.puts(String.pad_leading(Integer.to_string(count), 8) <> " " <> word)
end)

by this:

|> Enum.map(fn {word, count} ->
  [String.pad_leading(Integer.to_string(count), 8), " ", word, "\n"]
end)
|> IO.write()

jola · March 24, 2019, 9:06pm

Eked out a another 3s improvement by compiling a pattern for String.split. Now down to 14s run time. Also, if I switch to File.stream! instead of IO.binstream I’m down to 12s, but I’ll stick with the original code since that was part of the challenge.

defmodule App do
  def run do
    :ets.new(:words, [{:write_concurrency, true}, :named_table, :public])
    space = :binary.compile_pattern([" ", "\n"])

    IO.binstream(:stdio, :line)
    |> Flow.from_enumerable()
    |> Flow.flat_map(&String.split(&1, space))
    |> Flow.each(fn word ->
      :ets.update_counter(:words, word, {2, 1}, {word, 0})
    end)
    |> Flow.run()

    :ets.match_object(:words, {:"$0", :"$1"})
    |> Enum.sort(fn {_, a}, {_, b} -> b < a end)
    |> Enum.map(fn {word, count} ->
      [String.pad_leading(Integer.to_string(count), 8), " ", word, "\n"]
    end)
    |> IO.puts
  end
end

IO.puts(:stderr, elem(:timer.tc(&Aapp.run/0), 0))

in comparison the Flow-less synchronous version runs in 25s

defmodule App do
  def run do
    :ets.new(:words, [{:write_concurrency, true}, :named_table])
    space = :binary.compile_pattern([" ", "\n"])

    IO.binstream(:stdio, :line)
    |> Stream.flat_map(&String.split(&1, space))
    |> Enum.each(fn word -> :ets.update_counter(:words, word, {2, 1}, {word, 0}) end)

    :ets.match_object(:words, {:"$0", :"$1"})
    |> Enum.sort(fn {_, a}, {_, b} -> b < a end)
    |> Enum.map(fn {word, count} ->
      [String.pad_leading(Integer.to_string(count), 8), " ", word, "\n"] end)
    |> IO.puts
  end
end

IO.puts(:stderr, elem(:timer.tc(&App.run/0), 0))

jola · March 24, 2019, 9:21pm

The original post has examples of how to do it, eg: time ./exwp.exs < words.txt > /dev/null. So to convert your command: elixir wc.exs < words.txt.

joaoevangelista · March 24, 2019, 11:33pm

I think I’m doing something wrong, I went to powershell and piped to stdio and timed with :timer.tc/1 and I get 2.5s:

Here is the code

c = fn ->
 :ets.new(:words, [{:write_concurrency, true}, :named_table])

  IO.stream(:stdio, :line)
  |> Stream.flat_map(&String.split(&1, " ")) # using space decrease from approx half than using split/1
  |> Enum.each(fn word -> :ets.update_counter(:words, word, {2, 1}, {word, 0}) end)
  :ets.match_object(:words, {:"$0", :"$1"})
  |> Enum.sort(fn {_, a}, {_, b} -> b < a end)
  # print it beautifully
  |> Enum.map(fn {word, count} ->
    [String.pad_leading(Integer.to_string(count), 8), " ", word, "\n"]
  end)
  |> IO.write()
end

{micro, :ok} = :timer.tc(c)
IO.puts("#{micro/1000}ms")
IO.puts(":timer.tc/1 #{micro}")

2593.0ms
:timer.tc/1 2593000

If interested these are my versions:

Erlang/OTP 21 [erts-10.3] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1]
Elixir 1.8.1 (compiled with Erlang/OTP 20)
Windows 10 Pro 1809

jola · March 25, 2019, 6:27am

Are you sure your output is correct? I ran your code and you have lots of \n in your words, because you split only on " ". Each line ends with a \n. Remove your argument to split, so it says &String.split(&1) or use my solution of compiling a pattern. Also running your code took

37520.331ms
:timer.tc/1 37520331

Maybe you run it faster because your words.txt is too short? Here’s the number of lines in mine

 wc -l words.txt
   1819045 ../words.txt

You can also read the number of lines in your output. This is how many lines I get from your code (you get almost twice as many lines in your output because of all the new line characters, I believe you should get 557K if your original file has 1.8M lines)

elixir wc.exs < ../words.txt | wc -l
  935401

If you look at the original article the fastest C version took 3.7s, so there’s probably something wrong here

joaoevangelista · March 25, 2019, 2:31pm

Oooh It makes sense, I counted the words and it’s way below yours, I thought I got it right from the file but maybe it was still downloading when I saved.
Also I noticed that without " " on split it takes way longer! Thank you for the findings!

jeramyRR · March 25, 2019, 10:58pm

I think running an exs is going to be slower than running a compiled release in any case, so this benchmark won’t accurately reflect a real world case of a deployed app. Please correct me if I’m wrong everyone.

jola · March 26, 2019, 7:18am

.exs files are also compiled, it just automatically happens before it runs. Obviously that means you keep paying that cost, but for a tiny script like this, you’re not gonna notice it. Also measurements are of execution time, not startup.

Building a release does not apply any performance optimizations, it just compiles the code like you would otherwise with MIX_ENV=prod mix compile, AFAIK. There are no mentions in the Distillery docs of releases being faster. It’s not what releases are for.

josevalim · March 26, 2019, 8:08am

To clarify, it depends on the .exs file, but for other reasons besides compilation.

Running elixir path/to/exs won’t have any protocol consolidated. Running inside a Mix project with mix run path/to/exs (and also in a release) may be more performant depending on the idioms used. I don’t think it would in this case though.

jola · March 26, 2019, 9:00am

Yeah, I can confirm that running mix run vs elixir doesn’t appear to make a difference, at least enough that it’s measurable in this simplistic benchmark.

Here’s three runs each of elixir and mix run

➜  app mix run lib/nodep.exs < ../words.txt >/dev/null
22379157
➜  app mix run lib/nodep.exs < ../words.txt >/dev/null
24923662
➜  app mix run lib/nodep.exs < ../words.txt >/dev/null
22455488
➜  app elixir lib/nodep.exs < ../words.txt >/dev/null 
22172380
➜  app elixir lib/nodep.exs < ../words.txt >/dev/null
22035877
➜  app elixir lib/nodep.exs < ../words.txt >/dev/null
22763843

The single outlier at 24s can pretty safely be ignored.

But again, none of this is a proper benchmark. All we’ve shown is that Elixir can be written to perform decently in a text processing script. The original article 2019 ruby 2.6 version took 37s, meaning we’re quite a lot faster than ruby at this point.

ellispritchard · March 29, 2019, 10:44pm

For the serial code, it knocks about 6 seconds off for me (1m13 vs 1m19 - 2GHz MacBook) if I use a un-named ETS table, i.e.

table = :ets.new(:words, [])

and then use the reference directly instead of the name:

|> Enum.each(fn word -> :ets.update_counter(table, word, {2, 1}, {word, 0}) end)

etc.

This makes sense since now ETS doesn’t have to be concerned about concurrency at all, and there’s no lookup for the table by name.

Of course, now you can’t take advantage of concurrency