Elixir vs. Python performance benchmarking

I’ve been working on an Elixir project that has required a lot of scripting. I usually reach for Elixir because I like it more (and in this case, I could reuse code). However, I’ve noticed that the performance is sometimes poor. I have tried doing the same task in Python, and at in my initial tests, Python is much faster.

Here’s the repo (specifically the scripts/ directory):

To reproduce the behavior (after install and mix deps.get):

  1. mix run scripts/make_files.exs : this preps the directory with sample files – takes maybe 30 seconds.
  2. mix run scripts/vet_files.exs to run the Elixir version of parsing/vetting the files. Example output: Duration: 2424 ms
  3. Compare with python scripts/vet_files.py with example output Duration: 608 ms

I haven’t spent a whole lot of time trying to refactor the Elixir (or the Python) code, but this setup is a fairly accurate recreation of one of the tasks we needed to figure out, and when you’re dealing with lots and lots of files, even little inefficiencies add up.

I’m wondering if the community here can share any insights or knowledge about Elixir’s performance for scripts such as this. Thanks in advance!

5 Likes

I’d be curious what would happen if you piped the file into file.read instead of stream. It looks like Python reads the file completely into memory before processing the lines.

Also you might want to check out genstage which is optimized for creating data pipelines.

*disclaimer: I’m less experienced than the average user on this forum.

1 Like

Interesting, thanks for sharing. In my case, which is mostly about lists crunching, Elixir is many tens of times faster than Python. With some optimizations the advantage becomes many hundreds of times faster even with 5 times larger amounts of data. For example, an imperative algorithm on a list of ~130k dicts in Python would take me about 20 minutes. Trying to rewrite it functionally and seeing the function just get stuck and never return is what made me switch to Elixir :grinning:. In Elixir, with an imperative algorithm, I’ve been able to get done with a list of ~500k structs in ~200ms.

Your .exs script is ~1000ms for me, while .py one is ~500ms. Elixir 14.3/OTP 25, Python 3.10.7.

2 Likes

Here is a PR that gets the Elixir performance closer to the python performance, though still slower.

I saw from eprof that after Jason, most time was spent in genservers and cleaning up processes. This is due to the File module opening a new process for every opened file.

➜  ex_vs_py git:(main) ✗ mix profile.eprof scripts/vet_files.exs
Warmup...

Duration: 2009 ms
Duration: 1604 ms

Profile results of #PID<0.212.0>
#                                                          CALLS     %  TIME µS/CALL
Total                                                    3460446 100.0 43300    0.13

:gen_server.call/3                                         64901  2.21  9567    0.15
:erlang.monitor/2                                          64902  2.60 11244    0.17
:file.check_args/1                                        184703  2.70 11683    0.06
File.exists?/2                                             54901  2.91 12604    0.23
Enum."-each/2-lists^foreach/1-0-"/2                        64901  2.93 12698    0.20
:file.read_file_info/2                                     54901  3.31 14329    0.26
:erlang.demonitor/2                                        64902  4.34 18778    0.29
:file.call/2                                               64901  5.90 25540    0.39
:gen.do_call/4                                             64901 10.78 46673    0.72
Jason.Decoder.string/6                                   1322723 21.88 94739    0.07

My PR replaces a lot of these calls with :prim_file which is a nif and does not spawn a process for every file. That also allows the Task.async_stream to help performance; including the async_stream while using the File module just leads to the File genserver being a bottleneck.

Next you might try improving the json performance, perhaps with Eljiffy.

3 Likes

I suspect the stream might be the culprit here, if your file is never that big avoid using stream or read bigger chunks at once.

While it’s useful to look at the perf of the erlang code you’d probably also want to evaluate how much of the time is starting up the beam vm. Not sure which otp applications are started by default with mix, but I recently read that they contribute to a good chunk of the startup time for things running on the beam.

1 Like

Yeah, but in this benchmark all applications are started before the script is executed

I’ve modified both scripts to return a list of results and Elixir beats python here.

start_time = :erlang.monotonic_time(:millisecond)
index_file = "tmp/files/index.txt"

results =
  index_file
  |> File.stream!()
  |> Task.async_stream(fn line ->
    path = String.trim(line)
    {:ok, contents} = :prim_file.read_file(path)
    %{"paths" => txt_paths} = :jiffy.decode(contents, [:return_maps])
  end, max_concurrency: 16, ordered: false)
  |> Stream.flat_map(fn {:ok, results} -> results end)
  |> Task.async_stream(fn file ->
    match? {:ok, _}, :prim_file.read_file_info(file)
  end, max_concurrency: 8, ordered: false)
  |> Enum.map(fn {:ok, result} -> result end)

end_time = :erlang.monotonic_time(:millisecond)
duration = end_time - start_time
IO.puts("Duration: #{duration} ms")
IO.inspect Enum.all? results

vs

from os.path import exists
import json
from time import time

index_file = "tmp/files/index.txt"

existing = []

def vet_files():
    with open(index_file, 'r') as myfile:
        for line in myfile:
            open_json_file(line.rstrip())
            # dict_obj = json.loads(person_data)


def open_json_file(json_file):
    with open(json_file, 'r') as myfile:
        for line in myfile:
            data = json.loads(line)
            files_exist(data['paths'])

def files_exist(paths):
    for p in paths:
        existing.append(exists(p))

if __name__ == "__main__":
    start_time = int(time() * 1000)
    vet_files()
    end_time = int(time() * 1000)
    print(f'Duration: {end_time - start_time} ms')
    print(all(existing))

And the results are 291ms elixir vs 314ms python.

4 Likes

Some general remarks for guidance:

  1. Keep in mind the Erlang VM makes specific trade-offs in relation to high-performance, such as process preemption. It is better to have a predictable system that goes slightly less fast than a fast ones that is unpredictable (or crashes)

  2. Streams have lower memory usage at the cost of higher CPU usage. If your goal is to go as fast as you can, not using streams may be better (such as File.read! |> String.split("\n", trim: true))

  3. You should see benefits by adding Task.async_stream and similar so you can leverage multi-core

  4. I would assume that most of the time is taken by JSON parsing so remember Jason is a pure Elixir package. I assume that the json parsing in Python is most likely done in C. So you may have better results by using something like jiffy (and a more apples to apples comparison)

20 Likes

Most of time is spent on accessing files. I don’t know, but I thought that Erlang team has switched to epoll for prim_file on linux

Default file IO in Erlang is fairly slow. I’d recommend using it with the [:raw] option - it bypasses several layers of abstraction that introduce quite a fair bit of overhead.

5 Likes

What do these layers of abstraction do?

raw has no effect on read. It is only for file handles. In this case, still goes to a server that serializes both reads and writes. I will try to start a discussion on why that’s the case. It feels unnatural to try to address a race condition that is natural to the file system itself and will happen with other programs running in the same machine anyway.

2 Likes

It looks like the overhead is spinning up a new process:

raw
Allows faster access to a file, as no Erlang process is needed to handle the file. However, a file opened in this way has the following limitations:

  • The functions in the io module cannot be used, as they can only talk to an Erlang process. Instead, use functions read/2,read_line/1, and write/2.
  • Especially if read_line/1 is to be used on a raw file, it is recommended to combine this option with option {read_ahead, Size} as line-oriented I/O is inefficient without buffering.
  • Only the Erlang process that opened the file can use it.
  • A remote Erlang file server cannot be used. The computer on which the Erlang node is running must have access to the file system (directly or through NFS).

When the mode isn’t raw, the iodevice is a pid.

3 Likes

Thank you all for the continued input. This is interesting! I formalized my repo to use Benchee so I could continue trying out some variants. Here are the results (so far):

Name                        ips        average  deviation         median         99th %
python                     2.27         0.44 s    ±22.35%         0.45 s         0.64 s
:prim_file async           1.38         0.72 s    ±22.75%         0.63 s         1.04 s
Concurrent                 0.63         1.59 s     ±8.90%         1.59 s         1.78 s
Split file                 0.40         2.51 s    ±22.56%         2.35 s         3.30 s
Task.async_stream          0.32         3.16 s    ±21.99%         3.19 s         3.95 s
:prim_file                 0.31         3.26 s    ±41.38%         2.82 s         5.19 s
File                       0.31         3.27 s    ±24.74%         3.48 s         4.00 s
Jsonrs                     0.28         3.54 s    ±20.90%         3.56 s         4.27 s

Comparison:
python                     2.27
:prim_file async           1.38 - 1.64x slower +0.28 s
Concurrent                 0.63 - 3.59x slower +1.14 s
Split file                 0.40 - 5.68x slower +2.07 s
Task.async_stream          0.32 - 7.16x slower +2.72 s
:prim_file                 0.31 - 7.37x slower +2.81 s
File                       0.31 - 7.41x slower +2.83 s
Jsonrs                     0.28 - 8.02x slower +3.10 s

In short, Python is still the fastest. The fastest Elixir solution (so far) is the one that uses Task.async_stream and the :prim_file:

    index_file
    |> File.stream!()
    |> Task.async_stream(fn line ->
      path = String.trim(line)
      {:ok, contents} = :prim_file.read_file(path)
      {:ok, %{"paths" => txt_paths}} = Jason.decode(contents)

      Enum.each(txt_paths, fn p ->
        :prim_file.read_file_info(p)
      end)
    end)
    |> Stream.run()

I tried variants that used EITHER Task.async_stream OR :prim_file, but they didn’t perform as well. Loading the file into memory instead of streaming it also didn’t perform as well. I haven’t been able to get jiffy working, so I gave jsonrs a try, but unfortunately, it performed the worst of these (!!).

What is challenging here is that the solutions have very different performance characteristics. In other words, it’s easy to fall into a hole here, so I’m hoping to identify patterns to avoid. I should probably try coming up with more simplified use-cases, because this one touches on a lot of things: streaming, checking the file system, and JSON decoding.

2 Likes

I’m interested in Elixir solutions that are not only faster, but also as clean and naive as the Python solution (from above). It doesn’t use any special Python libraries. It doesn’t obviously drop into C code (like the best Elixir version uses Erlang functions and types. ?)

This is very clean code (literally, in Bob Martin’s Clean Coding style.)

from os.path import exists
import json
from time import time

index_file = "tmp/files/index.txt"

existing = []

def vet_files():
    with open(index_file, 'r') as myfile:
        for line in myfile:
            open_json_file(line.rstrip())
            # dict_obj = json.loads(person_data)


def open_json_file(json_file):
    with open(json_file, 'r') as myfile:
        for line in myfile:
            data = json.loads(line)
            files_exist(data['paths'])

def files_exist(paths):
    for p in paths:
        existing.append(exists(p))

if __name__ == "__main__":
    start_time = int(time() * 1000)
    vet_files()
    end_time = int(time() * 1000)
    print(f'Duration: {end_time - start_time} ms')
    print(all(existing))

I’d be curious how it would do against raw file

index_file
|> :file.open([:raw, :read_ahead])
|> :file.read(1_000_000)
|> ...

Is that so? Only because you are using a python function to call the library function is doesn’t mean there isn’t a native C implementation under the hood.
What about the abomination the python is at this moment in time? Nobody can’t understand at this point if the language is interpreted or compiled anymore because of how many optimizations are in place to make it fast.

like the best Elixir version uses Erlang functions and types

If you are just getting in elixir you might be thinking that using an erlang library is strange and it is the same as calling C code, however this is definitely not true as elixir gets compiled to erlang, so no overhead is involved here.
Moreover if you have access to 2 separate languages and ecosystems without any setup and overhead why not use whats best from both worlds?

This is very clean code (literally, in Bob Martin’s Clean Coding style.)

Is that so? What about concurrency? The elixir solution above is either using tasks or streams and you are showing a solution that can run only in a blocking manner.

1 Like

Just for the record, that was my Python code and I made no attempts to optimize – I just poked at it for a few minutes until it worked. :grimacing:

3 Likes

Reading chunks of data (instead of lines) is awkward in this case because each line contains a value. When processing chunks, you have to manually split on newlines and reassemble any values that got split. (At least, I need more coffee before I can come up with a solution to that). Also :file.read/2 returns charlists, and I’m not sure what kind of overhead it would be introducing to convert those back into strings.

1 Like