Elixir vs. Python performance benchmarking

Thank you all for the continued input. This is interesting! I formalized my repo to use Benchee so I could continue trying out some variants. Here are the results (so far):

Name                        ips        average  deviation         median         99th %
python                     2.27         0.44 s    ±22.35%         0.45 s         0.64 s
:prim_file async           1.38         0.72 s    ±22.75%         0.63 s         1.04 s
Concurrent                 0.63         1.59 s     ±8.90%         1.59 s         1.78 s
Split file                 0.40         2.51 s    ±22.56%         2.35 s         3.30 s
Task.async_stream          0.32         3.16 s    ±21.99%         3.19 s         3.95 s
:prim_file                 0.31         3.26 s    ±41.38%         2.82 s         5.19 s
File                       0.31         3.27 s    ±24.74%         3.48 s         4.00 s
Jsonrs                     0.28         3.54 s    ±20.90%         3.56 s         4.27 s

Comparison:
python                     2.27
:prim_file async           1.38 - 1.64x slower +0.28 s
Concurrent                 0.63 - 3.59x slower +1.14 s
Split file                 0.40 - 5.68x slower +2.07 s
Task.async_stream          0.32 - 7.16x slower +2.72 s
:prim_file                 0.31 - 7.37x slower +2.81 s
File                       0.31 - 7.41x slower +2.83 s
Jsonrs                     0.28 - 8.02x slower +3.10 s

In short, Python is still the fastest. The fastest Elixir solution (so far) is the one that uses Task.async_stream and the :prim_file:

    index_file
    |> File.stream!()
    |> Task.async_stream(fn line ->
      path = String.trim(line)
      {:ok, contents} = :prim_file.read_file(path)
      {:ok, %{"paths" => txt_paths}} = Jason.decode(contents)

      Enum.each(txt_paths, fn p ->
        :prim_file.read_file_info(p)
      end)
    end)
    |> Stream.run()

I tried variants that used EITHER Task.async_stream OR :prim_file, but they didn’t perform as well. Loading the file into memory instead of streaming it also didn’t perform as well. I haven’t been able to get jiffy working, so I gave jsonrs a try, but unfortunately, it performed the worst of these (!!).

What is challenging here is that the solutions have very different performance characteristics. In other words, it’s easy to fall into a hole here, so I’m hoping to identify patterns to avoid. I should probably try coming up with more simplified use-cases, because this one touches on a lot of things: streaming, checking the file system, and JSON decoding.

2 Likes