Elixir vs. Python performance benchmarking

Ok so :file.read_line and go through all the lines?

I tried this (building off of the previous performant solution):

    index_file
    |> stream_file()
    |> Task.async_stream(
      fn line ->
        path = :lists.droplast(line)
        {:ok, contents} = :prim_file.read_file(path)
        {:ok, %{"paths" => txt_paths}} = Jason.decode(contents)
        txt_paths
        []
      end,
      max_concurrency: 10,
      ordered: false
    )
    |> Stream.flat_map(fn {:ok, results} -> results end)
    |> Task.async_stream(
      fn path ->
        match?({:ok, _}, :prim_file.read_file_info(path))
      end,
      max_concurrency: 10,
      ordered: false
    )
    |> Stream.run()

I streamed the file with this code (I think I’m probably reinventing wheels here, but it was educational):

  def stream_file(input_file) do
    Stream.resource(
      fn ->
        {:ok, file} = :file.open(input_file, [:raw, :read, read_ahead: 8192])
        file
      end,
      fn file ->
        case :file.read_line(file) do
          {:ok, line} ->
            {[line], file}

          :eof ->
            {:halt, file}
        end
      end,
      fn file -> :file.close(file) end
    )
  end

This performed more or less the same as the other solutions.

1 Like

Python has had a lot of speed optimization go into it over the years. I’m not too surprised that a first solution turned out to be pretty good.

Yeah, like Java. We did a PoC at work and “bare” Java (no Spring, Hibernate, or anything but the bare minimum for external libraries) blew away Elixir, Python, and Go, and came in second only to C.

For our use case Elixir was faster than Python. However, one thing came to light in our PoC that was interesting–Python’s concurrency story is still not great. Async/await code is super hard to reason about and debug. There is very little visibility into the event loop in Python async code, and async code tends to proliferate in your codebase since you can’t call await from a non async function. So when you need to call a coroutine with await the calling function must become async…which means it’s calling function will need to await it and become async. There are ways around this but they are non-obvious and make your code complex.

The Elixir/BEAM approach is so, so much better. Not that it is a silver bullet for complex systems, but if you are writing a large scale system that depends heavily on concurrency for performance, I would choose Elixir over Python.

That’s not even getting into the additional DX issues–introspectability, observability, multi-node deployments, the list goes on.

7 Likes