fireproofsocks
Elixir vs. Python performance benchmarking
I’ve been working on an Elixir project that has required a lot of scripting. I usually reach for Elixir because I like it more (and in this case, I could reuse code). However, I’ve noticed that the performance is sometimes poor. I have tried doing the same task in Python, and at in my initial tests, Python is much faster.
Here’s the repo (specifically the scripts/ directory):
To reproduce the behavior (after install and mix deps.get):
mix run scripts/make_files.exs: this preps the directory with sample files – takes maybe 30 seconds.mix run scripts/vet_files.exsto run the Elixir version of parsing/vetting the files. Example output:Duration: 2424 ms- Compare with
python scripts/vet_files.pywith example outputDuration: 608 ms
I haven’t spent a whole lot of time trying to refactor the Elixir (or the Python) code, but this setup is a fairly accurate recreation of one of the tasks we needed to figure out, and when you’re dealing with lots and lots of files, even little inefficiencies add up.
I’m wondering if the community here can share any insights or knowledge about Elixir’s performance for scripts such as this. Thanks in advance!
Most Liked
josevalim
Some general remarks for guidance:
-
Keep in mind the Erlang VM makes specific trade-offs in relation to high-performance, such as process preemption. It is better to have a predictable system that goes slightly less fast than a fast ones that is unpredictable (or crashes)
-
Streams have lower memory usage at the cost of higher CPU usage. If your goal is to go as fast as you can, not using streams may be better (such as
File.read! |> String.split("\n", trim: true)) -
You should see benefits by adding Task.async_stream and similar so you can leverage multi-core
-
I would assume that most of the time is taken by JSON parsing so remember Jason is a pure Elixir package. I assume that the json parsing in Python is most likely done in C. So you may have better results by using something like
jiffy(and a more apples to apples comparison)
cro
Yeah, like Java. We did a PoC at work and “bare” Java (no Spring, Hibernate, or anything but the bare minimum for external libraries) blew away Elixir, Python, and Go, and came in second only to C.
For our use case Elixir was faster than Python. However, one thing came to light in our PoC that was interesting–Python’s concurrency story is still not great. Async/await code is super hard to reason about and debug. There is very little visibility into the event loop in Python async code, and async code tends to proliferate in your codebase since you can’t call await from a non async function. So when you need to call a coroutine with await the calling function must become async…which means it’s calling function will need to await it and become async. There are ways around this but they are non-obvious and make your code complex.
The Elixir/BEAM approach is so, so much better. Not that it is a silver bullet for complex systems, but if you are writing a large scale system that depends heavily on concurrency for performance, I would choose Elixir over Python.
That’s not even getting into the additional DX issues–introspectability, observability, multi-node deployments, the list goes on.
michalmuskala
Default file IO in Erlang is fairly slow. I’d recommend using it with the [:raw] option - it bypasses several layers of abstraction that introduce quite a fair bit of overhead.







