My Elixir code is slower than my Java implementation. Am I doing it wrong?

Today I conducted a small exercise where I compared each word in a text file to a list of dirty words. I downloaded around 20 books from http://www.gutenberg.org/browse/scores/top and used the Flow example from Flows documentation (in Avoiding single sources section), just replacing the reduce function with a new one.

I did the exercise in Elixir and Java just to see how much better it would be in Elixir. Turns out Elixir was 5x slower than the java implementation.

This has me a bit worried that either Elixir isn’t as performant as I thought, or that Elixir is only good for very simple things, or that I’m just doing it wrong.

Below is the elixir code I was testing:

def start3() do
  streams = for file <- File.ls!("test/resources") do
    File.stream!("test/resources/#{file}", read_ahead: 100_000)
  end

  streams
  |> Flow.from_enumerables()
  |> Flow.map(&String.replace(&1, "\n", ""))
  |> Flow.map(&String.downcase(&1))
  |> Flow.flat_map(&String.split(&1, " "))
  |> Flow.filter(&String.starts_with?(&1, @badwords))
  |> Flow.partition()
  |> Flow.reduce(fn -> %{} end, fn word, acc ->
    Map.update(acc, word, 1, & &1 + 1)
  end)
  |> Enum.to_list()
end

Here is the Java code:

public Map<String, Integer> test3() throws IOException {
        File dir = new File("/Users/jeramy/dev/elixir/veronica/test/resources");
        Stream<Path> filesStream = Files.list(dir.toPath());

        final Map<String, Integer> foundDirtyWords = new HashMap<>();
        filesStream.forEach(fileStream -> {
                    try (Stream<String> stream = Files.lines(fileStream)) {
                        stream.map(line -> line.replace("\n", ""))
                                .flatMap(line -> Arrays.stream(line.split(" ")))
                                .filter(word -> dirtyWords.stream().anyMatch(w -> word.toLowerCase().contains(w.toLowerCase())))
                                .forEach(word -> {
                                    int val = foundDirtyWords.getOrDefault(word, 0);
                                    foundDirtyWords.put(word, val + 1);
                                });

                    } catch (IOException e) {
                        // do nothing.  this is just a test
                    }
                }
        );

        return foundDirtyWords;
    }

Any thoughts or suggestions?

1 Like

It might be useful to post the Java code you’re using for comparison.

To beeing able to proper analyze why X is slower than Y, we need do see both, X and Y. So can you please post your Java code as well?

But first things that come to my mind as possible candidates here:

  1. &String.replace(&1, "\n", "") will probably result in a lot of copying of large strings!
  2. do you really want to convert "foo\nbar" into single word "foobar"?
  3. &String.downcase(&1) will probably copy a lot of large chunks as well.
  4. &String.split(&1, " ") should be done earlier to decrease chunk size early and reduce copysizes in the previous steps
  5. &String.starts_with?(&1, @badwords) iterates @badwords completely in the worst case, you should use a listless approach, perhaps generate a multiheaded function from @badwords as I’ll draft out below
Enum.each @badwords, fn badword ->
  def prefixed_with_badword?(unquote(badword) <> _), do: true
end
def prefixed_with_badword?(_), do: false

Be aware that this might increase compilation time massively, but will result in a much better runtime, because you do not need to crawl the hole list anymore for every single word in the test, but have a single function that does a much faster pattern match.

3 Likes

Elixir is not the speediest language when it comes to computationally heavy things, so the results are not that surprising. The sweetspot lies in IO-heavy, concurrent workloads.

As mentioned there’s still place for optimising the elixir code. I would rather try an approach when you read in the whole text and handle each book in a single process. Taking a quick look, the documents are usually rather small - up to 100s of kB, so being eager should work out much better - flow might be just too much for such small data sizes.

For String.starts_with?/2 you can also compile the pattern with :binary.compile_pattern - if the list of words is big, this should speed it up.

4 Likes

Which would introduce a one-time overhead for each invocation of compile_pattern/1, so if done wrong might slow down even more. So if done, the pattern should be compiled either at the start of start3/0 or during compiletime.

It does basically build up a match structure similar to those used in a pattern match and gains its speed from there. I think it might be worth to compare my predicate with a compiled pattern in benchmarks.

Compiling the pattern is definitely not going to slow it down. The internal code that String.starts_with? eventually calls will compile the pattern before using it if it’s not compiled. The thing you warn against is already happening. And yes, I intended for the pattern to be compiled just once and then reused in all the calls.

1 Like

if you can share all your java code and Elixir, I can take a look as well

Everyone has suggested many changes. If you incorporate the changes and rerun the test and post your results then it might be helpful. Also, post the Java code.

1 Like

A few points that have probably already been made:

  1. JVM is going to be much faster for long-running complex computations because it will eventually JIT-compile the hot path of the computation to machine code. The 5x difference you posted doesn’t seem surprising.

  2. Out of curiosity, have you tried without using Flow entirely? Flow is going to have some overhead anyway, but in your case that overhead might dominate. Also, are you running this on a multi-core machine or a single-core VM?

  3. These micro-benchmarks can really be misleading if you are not familiar with the strengths and weaknesses of the underlying runtime, and if you are not using the same data on the same machine that the final application will use.

  4. This is a fun exercise for people to do - it would be nice if you put your code and source in a GitHub repo so that we can experiment.

The first important point is to not generalize. That’s like going outside, seeing a black Ferrari, and then concluding that all Ferrari’s are black.

In your particular code, you call flow, map and string, so at the best you can conclude that some of the operations in flow, map and string are not as performant as you would expect. Especially because this type of benchmark is about executing some operations millions of times.

It may also be that you are using the wrong tool to solve the problem, in this case flow. There is a cost where coordinating across multiple cores becomes worth it and you may not have reached it.

There is also the fact that Erlang/Elixir will be slower than Java. If you need a language that is as fast as Java, then you should pick Java or maybe C. There are other mainstream languages that are slower than Elixir too and that is not a deterrent for many using those languages.

Everyone gave a lot of feedback on how the code can be improved but that’s ultimately hard to do without having the data. If I have to guess, I would say that the map operation is the one slowing everything down. Because maps are immutable, this kind of hot loop generates a lot of garbage. This is also covered in the performance section of the Flow documentation: https://hexdocs.pm/flow/Flow.html#module-performance-discussions

And before someone starts to worry about maps performance, it is worth saying that I have been programming in Elixir for almost 6 years and the only time I had to replace a map by a mutable data structure because of performance was exactly when working on GenStage + Flow.

12 Likes

Looks like you want to use sed to do the hard work for you :D.

In reality pure computation performance of Elixir and Erlang is not great. It’s till couple of magnitudes higher than say Ruby. In this very case, I would not be surprised if doing something like that with Ruby wasn’t quicker either. Especially with the above Elixir code that seems to be far from optimal. Generally, Java will be quicker in most benchmarks. Which is not the same as saying that generally Java programs or Java-powered web sites will be quicker.

2 Likes

Thanks everyone for the inputs. I will optimize with the suggestions and try again.

I should have put a little more info in my original post, but it was super late and I wanted to get the question out there while the other half of the world was awake.

My end goal is to be able to parse GBs worth of data for dirty words. Those dirty words will probably list in the thousands and have to be imported via a CSV dynamically. I can’t use them to compile macros because the customer can update this csv at any time while the app is running.

Now the funny thing about the flow vs java code above is that for a single text file flow is 3x faster than the java code, but once I introduce multiple files Java becomes more performant.

I hope I haven’t offended anyone with this thread. My favorite language is Elixir, even though I’m still a noob at it and don’t get enough practice time. I’m trying really hard to sell Elixir at work because I believe it can benefit my program greatly, and I’ve already told my managers its better in every way to Java (even though I know there are cases where it isn’t). I was getting hopeful when I saw how much faster the single file parsing was compared to Java, but then my hopes for selling it at work for this project started to dwindle when multiple files were introduced.

Updated the original with the Java code.

public Map<String, Integer> test3() throws IOException {
        File dir = new File("/Users/jeramy/dev/elixir/veronica/test/resources");
        Stream<Path> filesStream = Files.list(dir.toPath());

        final Map<String, Integer> foundDirtyWords = new HashMap<>();
        filesStream.forEach(fileStream -> {
                    try (Stream<String> stream = Files.lines(fileStream)) {
                        stream.map(line -> line.replace("\n", ""))
                                .flatMap(line -> Arrays.stream(line.split(" ")))
                                .filter(word -> dirtyWords.stream().anyMatch(w -> word.toLowerCase().contains(w.toLowerCase())))
                                .forEach(word -> {
                                    int val = foundDirtyWords.getOrDefault(word, 0);
                                    foundDirtyWords.put(word, val + 1);
                                });

                    } catch (IOException e) {
                        // do nothing.  this is just a test
                    }
                }
        );

        return foundDirtyWords;
    }

Can you put some links to the text files you’re using, and also importantly, a sample list of “dirty words”? My guess is that the implementation strategy will need to be different for 100 vs 10.000 dirty words.

1 Like

I downloaded the top 20 books from this link: http://www.gutenberg.org/browse/scores/top

The only two badwords I’m using at the moment are: king and queen

Okay, so after going through y’alls inputs I’ve updated my code to the below:

def compile_badwords() do
    pattern = @badwords
    |> Enum.map_join("|^", &(&1))

    #:binary.compile_pattern(pattern <> "/")
    Regex.compile(pattern, "i")
  end

  def start3() do
    streams = for file <- File.ls!("test/resources") do
      File.stream!("test/resources/#{file}")
    end

    empty_space = :binary.compile_pattern(" ")
    non_words = ~r/[\.\"\'@#$&%^*():;\\\/|<>\-\+_,!?]/
    new_line = :binary.compile_pattern("\n")
    {:ok, badwords_pattern} = compile_badwords()

    streams
    |> Flow.from_enumerables()
    |> Flow.map(&String.replace(&1, new_line, " "))
    |> Flow.map(&String.replace(&1, non_words, " "))
    |> Flow.flat_map(&String.split(&1, empty_space))
    |> Flow.filter(&String.match?(&1, badwords_pattern))
    |> Flow.partition()
    |> Flow.reduce(fn -> %{} end, fn word, acc ->
      Map.update(acc, String.downcase(word), 1, & &1 + 1)
    end)
    |> Enum.to_list()
  end

This significantly improves the performance. So much so that it’s only 1.5x slower than the Java equivalent, but I think there are things I can do to make it better.

I have however found another issue. My return map has multiple entries with the same word, which shouldn’t be possible. My guess is that they are different UTF representations or something. How do I make sure each is represented by the same characters no matter what unicode they are in?

Here is my return results:

[{“alexander—generals”, 1}, {“general’s”, 15}, {“kings—to”, 1},
{“queen”, 3}, {“queen’s”, 8}, {“generality”, 4}, {“generals—davout”, 1},
{“general—of”, 1}, {“kings’”, 4}, {"“general", 3}, {“kingly”, 1},
{“kingstead”, 1}, {“king’s”, 6}, {“queenless”, 3}, {“king”, 333},
{“kingdoms”, 5}, {“generalizations”, 1}, {“generally”, 144}, {“generals”, 4},
{“king”, 241}, {“kingdom”, 91}, {“kings”, 28}, {"‘general", 1},
{“general”, 157}, {“king”, 6}, {“king’s”, 92}, {"“general”", 1},
{“general”, 2}, {“generalization”, 2}, {“generalized”, 2}, {“generals”, 102},
{“generals—as”, 1}, {“general—could”, 1}, {“kings”, 4}, {“queen”, 109},
{“queens”, 2}, {“general”, 619}, {“generalizing”, 1}, {“generally”, 2},
{“kingdom”, 4}, {“kings””, 1}, {“queen”, 78}, {“queensberry”, 1},
{“queen’s”, 3}, {"“generals”", 1}]

You’ll notice the repeat of the badwords. Did I do the partition wrong perhaps?

I’ve been playing around with this problem as well, and I have found that doing a “naive” version vs a “flow” version.

I feel like I’m missing something from the concepts of Flow - my immediate thinking for something like this is that we’ll just tally the results from the different calculations at a single process, but perhaps this adds a bottleneck (or won’t even work for the use cases Flow was designed for?).

Note again, even though you will got a speedup from Flow, you have added great complexity and you used all your cores to go close to the performance JVM will give you using only a single core (my Java is very rusty, but I don’t believe the code you posted uses more than one thread).

In this particular case, I would benchmark just launching a Task for each file.

I’ve put my experiments on Github if anyone is interested: https://github.com/orestis/badwords_test - it contains the top 20 books and a word list (only two words as originally stated).

I’ve added a crude mix task so you can run this with mix bad words. You probably want to run with MIX_ENV=prod as well to get the production compiler settings.

1 Like

I havent found any doublettes when glancing over the list, can you please point them out seperately?

Also, have you tried to swap order of processing and splitting as I suggested to reduce copy sizes and even number of copy events?

Another thing, instead of compile_pattern I really think you should give my generated function a try. All your badwords are known at compile time, therefore you should create something that can match during compiletime, as it can squeeze the last bit out of it.

@Nobbz, I’m going to try your suggestions now, but the generated function one. The bad words are known to me now just for my tests, but in the future they will be in a spreadsheet that the customer can change willy nilly while the app is running.

Yeah, I didn’t see the post where you explained that, but read it just now. Also I’m only seeing now that you only have 2 badwords, therefore that meta-programming-trick might simply be not wort it…

Still the reordering of operations might.

I only highlighted duplicates of queen, although looking at it closer it looks as if all of them are duplicated several times.