Erlang/Elixir string performance - can this be improved?

Great summary @jola!

Note that If you are using ETS, you don’t need partition. So the previous version was also correct. In fact, if you don’t need partition for this example, you likely don’t need flow either and IO.binstream+Task.async_stream will most likely be fine.

For reading ETS, you can also use :ets.tab2list to get the whole table as a list, but I am not sure if it makes a big difference.

counters require an index (integer) as a key, so you would need a way to convert words into integers which may end-up offsetting the benefits you would get from counterss.

6 Likes

Forenote: I’m on Windows using Git Bash so my IO is garbage

I tried some of the code, first I used the words.txt file that @serpent provided and since I’m dumb and I don’t know how to feed from cat to elixir I used File.stream!/1 sorry!
But the points that I want to shared that at least occurred on my machine was :

  • String.split/3 on: |> Stream.flat_map(&String.split(&1, " ")) is faster than String.split/1 for splitting at spaces
  • Printing the values also adds a lot of overhead

I came to the loose conclusion that iterating and strings are not the weak point but the IO devices are?

Here is the code:

# file: wc.exs


# removes 20ms from creation
:ets.new(:words, [{:write_concurrency, true}, :named_table])

c = fn ->

  File.stream!("words.txt")
  |> Stream.flat_map(&String.split(&1, " ")) # using space decrease from approx half than using split/1
  |> Enum.each(fn word -> :ets.update_counter(:words, word, {2, 1}, {word, 0}) end)
  :ets.match_object(:words, {:"$0", :"$1"})
  |> Enum.sort(fn {_, a}, {_, b} -> b < a end)
  # print it beautifully
  |> Enum.each(fn {word, count} ->
    # print does add overhead from 187 to 2282
    IO.puts(String.pad_leading(Integer.to_string(count), 8) <> " " <> word)
  end)
end

# only time code execution, not vm start
{micro, :ok} = :timer.tc(c)
IO.puts("#{micro/1000}ms")


Run with: $ elixir wc.exs

You can do this to verify. Replace this:

|> Enum.each(fn {word, count} ->
  IO.puts(String.pad_leading(Integer.to_string(count), 8) <> " " <> word)
end)

by this:

|> Enum.map(fn {word, count} ->
  [String.pad_leading(Integer.to_string(count), 8), " ", word, "\n"]
end)
|> IO.write()
1 Like

Eked out a another 3s improvement by compiling a pattern for String.split. Now down to 14s run time. Also, if I switch to File.stream! instead of IO.binstream I’m down to 12s, but I’ll stick with the original code since that was part of the challenge.

defmodule App do
  def run do
    :ets.new(:words, [{:write_concurrency, true}, :named_table, :public])
    space = :binary.compile_pattern([" ", "\n"])

    IO.binstream(:stdio, :line)
    |> Flow.from_enumerable()
    |> Flow.flat_map(&String.split(&1, space))
    |> Flow.each(fn word ->
      :ets.update_counter(:words, word, {2, 1}, {word, 0})
    end)
    |> Flow.run()

    :ets.match_object(:words, {:"$0", :"$1"})
    |> Enum.sort(fn {_, a}, {_, b} -> b < a end)
    |> Enum.map(fn {word, count} ->
      [String.pad_leading(Integer.to_string(count), 8), " ", word, "\n"]
    end)
    |> IO.puts
  end
end

IO.puts(:stderr, elem(:timer.tc(&Aapp.run/0), 0))

in comparison the Flow-less synchronous version runs in 25s

defmodule App do
  def run do
    :ets.new(:words, [{:write_concurrency, true}, :named_table])
    space = :binary.compile_pattern([" ", "\n"])

    IO.binstream(:stdio, :line)
    |> Stream.flat_map(&String.split(&1, space))
    |> Enum.each(fn word -> :ets.update_counter(:words, word, {2, 1}, {word, 0}) end)

    :ets.match_object(:words, {:"$0", :"$1"})
    |> Enum.sort(fn {_, a}, {_, b} -> b < a end)
    |> Enum.map(fn {word, count} ->
      [String.pad_leading(Integer.to_string(count), 8), " ", word, "\n"] end)
    |> IO.puts
  end
end

IO.puts(:stderr, elem(:timer.tc(&App.run/0), 0))
3 Likes

The original post has examples of how to do it, eg: time ./exwp.exs < words.txt > /dev/null. So to convert your command: elixir wc.exs < words.txt.

I think I’m doing something wrong, I went to powershell and piped to stdio and timed with :timer.tc/1 and I get 2.5s:

Here is the code

c = fn ->
 :ets.new(:words, [{:write_concurrency, true}, :named_table])

  IO.stream(:stdio, :line)
  |> Stream.flat_map(&String.split(&1, " ")) # using space decrease from approx half than using split/1
  |> Enum.each(fn word -> :ets.update_counter(:words, word, {2, 1}, {word, 0}) end)
  :ets.match_object(:words, {:"$0", :"$1"})
  |> Enum.sort(fn {_, a}, {_, b} -> b < a end)
  # print it beautifully
  |> Enum.map(fn {word, count} ->
    [String.pad_leading(Integer.to_string(count), 8), " ", word, "\n"]
  end)
  |> IO.write()
end

{micro, :ok} = :timer.tc(c)
IO.puts("#{micro/1000}ms")
IO.puts(":timer.tc/1 #{micro}")

2593.0ms
:timer.tc/1 2593000

If interested these are my versions:

Erlang/OTP 21 [erts-10.3] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1]
Elixir 1.8.1 (compiled with Erlang/OTP 20)
Windows 10 Pro 1809

1 Like

Are you sure your output is correct? I ran your code and you have lots of \n in your words, because you split only on " ". Each line ends with a \n. Remove your argument to split, so it says &String.split(&1) or use my solution of compiling a pattern. Also running your code took

37520.331ms
:timer.tc/1 37520331

Maybe you run it faster because your words.txt is too short? Here’s the number of lines in mine

 wc -l words.txt
   1819045 ../words.txt

You can also read the number of lines in your output. This is how many lines I get from your code (you get almost twice as many lines in your output because of all the new line characters, I believe you should get 557K if your original file has 1.8M lines)

elixir wc.exs < ../words.txt | wc -l
  935401

If you look at the original article the fastest C version took 3.7s, so there’s probably something wrong here :slight_smile:

Oooh It makes sense, I counted the words and it’s way below yours, I thought I got it right from the file but maybe it was still downloading when I saved.
Also I noticed that without " " on split it takes way longer! Thank you for the findings!

I think running an exs is going to be slower than running a compiled release in any case, so this benchmark won’t accurately reflect a real world case of a deployed app. Please correct me if I’m wrong everyone.

2 Likes

.exs files are also compiled, it just automatically happens before it runs. Obviously that means you keep paying that cost, but for a tiny script like this, you’re not gonna notice it. Also measurements are of execution time, not startup.

Building a release does not apply any performance optimizations, it just compiles the code like you would otherwise with MIX_ENV=prod mix compile, AFAIK. There are no mentions in the Distillery docs of releases being faster. It’s not what releases are for.

To clarify, it depends on the .exs file, but for other reasons besides compilation.

Running elixir path/to/exs won’t have any protocol consolidated. Running inside a Mix project with mix run path/to/exs (and also in a release) may be more performant depending on the idioms used. I don’t think it would in this case though.

2 Likes

Yeah, I can confirm that running mix run vs elixir doesn’t appear to make a difference, at least enough that it’s measurable in this simplistic benchmark.

Here’s three runs each of elixir and mix run

➜  app mix run lib/nodep.exs < ../words.txt >/dev/null
22379157
➜  app mix run lib/nodep.exs < ../words.txt >/dev/null
24923662
➜  app mix run lib/nodep.exs < ../words.txt >/dev/null
22455488
➜  app elixir lib/nodep.exs < ../words.txt >/dev/null 
22172380
➜  app elixir lib/nodep.exs < ../words.txt >/dev/null
22035877
➜  app elixir lib/nodep.exs < ../words.txt >/dev/null
22763843

The single outlier at 24s can pretty safely be ignored.

But again, none of this is a proper benchmark. All we’ve shown is that Elixir can be written to perform decently in a text processing script. The original article 2019 ruby 2.6 version took 37s, meaning we’re quite a lot faster than ruby at this point.

For the serial code, it knocks about 6 seconds off for me (1m13 vs 1m19 - 2GHz MacBook) if I use a un-named ETS table, i.e.

table = :ets.new(:words, [])

and then use the reference directly instead of the name:

|> Enum.each(fn word -> :ets.update_counter(table, word, {2, 1}, {word, 0}) end)

etc.

This makes sense since now ETS doesn’t have to be concerned about concurrency at all, and there’s no lookup for the table by name.

Of course, now you can’t take advantage of concurrency :wink:

Hi, a friend of mine sent this over, so I gave it a try: https://gist.github.com/evadne/43e79dd51c068dc19e42086cdbecc8a7

FWIW

$ curl http://teralink.net/\~serpent/words.txt > words.txt

$ md5 words.txt 
MD5 (words.txt) = 640ef9e082ef75ef07f0e9869e9d8ae2

$du -h words.txt
217M	words.txt
Execution
$ elixir ./count.ex ./words.txt --sort > out-sorted.txt
Started
Using 4 Workers
Processing: 4.748574s
Load Results: 0.820397s
Order Results: 1.99405s
Print Results: 3.136065s
Total Runtime: 10.710908s
$ elixir ./count.ex ./words.txt > out.txt
Started
Using 4 Workers
Processing: 4.944752s
Load Results: 0.836416s
Print Results: 2.556994s
Total Runtime: 8.343633s
$ wc -l out*
  557745 out-sorted.txt
  557745 out.txt
 1115490 total
$ head -n 5 out-sorted.txt
1033175 0
103404 00
353 000
752 0000
19 00000

Please let me know your thoughts. Thanks again!

8 Likes

Nice! That’s a really cool solution, and probably scales well! It fails the first requirement though, the input should be piped in. Reading from a file is faster than reading from stdin in general (eg replace IO.binstream with File.read and it runs faster) and gives you much more control, like you show in your code.

You can speed up your version quite a bit if you switch to iolists, instead of printing line by line.

I actually ended up writing an article about this script, with my final version single threaded (13 lines of code) running in 13 seconds https://blog.jola.dev/elixir-string-processing-optimization

7 Likes

Hi @jola, did you try getting rid of the :line option and processing chunk by chunk? What I’ve seen so far (more details here: Streaming lines from an enum of chunks) is that this

File.stream!("numbers.txt",[],2048) #chunks instead of lines
|> Stream.transform("",fn chunk, acc ->
  [last_line | lines] = 
      acc <> chunk
      |> String.split("\n")
      |> Enum.reverse()
	{Enum.reverse(lines),last_line}
end)

seems to be 2x faster than using the :line option in File.stream!. I still have to try it in your code and see if there is any difference.

1 Like

I did do that in the article :slight_smile: and compared performance as well as some words on the downsides

1 Like

thx @jola, do you mean this part?

IO.binstream(:stdio, 102400)
|> Enum.to_list()
|> :binary.list_to_bin()
|> String.split(pattern)

Does it load everything into memory and then splits?

It does :slight_smile: like I said in the article, I didn’t spend too much time messing around with the chunk size, but that number gave decent performance improvement compared to reading line by line with that input on my machine.

Also, switching to reading the file directly would be faster, but the original article used stdin in all examples, so it would be not be a fair comparison.

Take a look at @evadne’s solution for one that optimizes reading over 4 workers if you’re curious how fast it can be reading directly from file.

1 Like

New version is up. It reads from STDIN and somehow manages to beat the file-based approach. :wink:

elixir ./count-stream.ex < words.txt > out.txt
Started
Processing: 4.627264s
Reporting: 0.771289s
Total Runtime: 5.408457s
elixir ./count-stream.ex --sort < words.txt > out.txt
Started
Processing: 4.761361s
Reporting: 1.936912s
Total Runtime: 6.707307s
9 Likes