I’m keenly interested in writing highly performant code in Elixir. Obviously, achieving performance levels like Go, C++, or Rust isn’t possible, and I understand that. However, I suspect there might still be some hidden tricks to make Elixir code more performant.
Could anyone point me towards any source material, books, talks, etc., that dive into the performance aspects of Elixir?
There are lots of possible improvements, for example:
Depending on specific case i.e. you have 2 or more algorithms, benchmark them using benchee and use desired solution
NIF and related language features not only can speed up code, but also allows to use many non-Elixir libraries.
Stream, Flow and other modules instead of Enum - instead of storing whole enumerable in memory we can read and process them separately… Similarly we can process multiple list elements at the same time. There are lots of solutions depending on what you want to achieve
Metaprogramming i.e. improve runtime code by for example generating a pattern-matching
Try to limit piped Enum calls, so iterate one list as few times as possible, often [head | tail]-based recursive function or Enum.reduce/3 is useful in such cases.
Please remember that there is no #1 optimisation. Sometimes you optimise resources usage and sometimes you use all available resources to speed up the work. Also it’s worth to mention that raw speed it not always welcome if you have to write very strict code. Often writing more generic code without every possible optimisation may save you huge amount of time in case you would be rewriting or enhancing it by supporting more structs and so on … Again, there is no a single good solution here as it’s very case-specific topic.
The most performance I managed to eke out of Elixir was:
When I parallelized the algorithm. The BEAM VM just absolutely excels at parallel programming.
When I made sure to not copy and pass a lot of data around i.e. if you need to periodically access stuff that’s several kilobytes it’s probably best to put it in ETS and just pass names / references to your workers – and not the data itself. That’s also quite true for any programming language btw; you can make an otherwise quick Rust program crawl down to JS / Golang level if you just constantly copy / clone data. (BTW it really must be said that this very strongly depends on the data, the algorithm, the amount of workers etc.; I also had success with just directly passing the data to workers and was confused as to why using ETS didn’t net me a performance win… until I realized that pulling data out of ETS copies them as well – so it’s all copying in the end but many other parameters in the equation can tilt the result one way or the other. Just measure.)
There are many more that could be inferred by production experience but these were my top 2 every time. I’d say you’ll have more success if you just try your hand at something and if you are not satisfied with the results, come back to the forum and we’ll give you guidance on per-case basis.
Without much more information about what you want your code to do, I can only offer more general advice to think about when writing code, as a supplement to Eiji’s great advice.
Do less.
It should not be surprising that code that does less usually finishes faster than code that does more. Applying this advice could look like:
Refactoring traversals of collections to traverse less, ideally once. (Echoed in Eiji’s post).
Performing fewer deep structure updates. While updates to immutable data are efficient from a VM perspective, they’re not free. Certain updates to data structures are much cheaper than others, and you should try and use those whenever possible. For example, building iolists can be much faster than repeatedly constructing binaries.
Using :ets, :atomics, and more.
Erlang’s standard library contains modules that can be a great help when writing code that needs high performance. :atomics and :counters are great for working with collections of integers that must be updated atomically. :persistent_term is very useful for accessing read-only data from many processes.
ETS is a more general-purpose tool, and learning to wield it well can dramatically improve performance in many situations.
Reading, and more importantly, understanding the advice written in the Erlang Efficiency Guide should take you pretty far along your journey to writing high-performance code that runs on the BEAM. The guide does a great job of explaining why certain code runs more slowly, and communicates useful insight to the BEAM VM that you can keep in your mind when you code.
Know when to stop
While it is very satisfying to write code that runs very quickly, certain optimizations and refactorings done in the name of performance can have a strong negative impact on the readability, testability, and maintainability of your code. To quote the late, great Joe Armstrong:
Make it work, then make it beautiful, then if you really, really have to, make it fast. 90 percent of the time, if you make it beautiful, it will already be fast. So really, just make it beautiful!
– Joe Armstrong, Erlang & OTP in Action
Nine times out of ten, in my career, the biggest wins have from finding better ways to model the data (MapSet, :ordset, and :digraph have helped with this in Elixir code) and figuring out how to skip steps with better algorithms. My new Livebook series, How to Train Your Scrappy Programmer, covers a lot of those techniques (using several practices mentioned in this thread and others). I would say that the two Livebooks most heavily covering this topic are Borrowing a Cup of Algorithms and Charting Our Course. There’s also some of this thinking in the free download: Data and the Code That Loves It.
For those who are more experienced in writing performant code, I would love to see someone in action taking some piece of slow code or a library and making it run fast. I would pay good money for a series on just that!
I have heard of the tools like fprof and benchee. I have also heard advice not to trust benchmarks. Or the advice to use a profiler to identify the bottlenecks before optimizing anything.
But it still feels like a bit of a dark art to me. How does one analyze the bottleneck and test out improvements? What mental models in Elixir do we need to know X is probably faster than Y?
Maybe I’m shouting into the void here, but if you are experienced at optimizing code, next time you have to optimize something that’s not proprietary, please hit record and publish it
My How to Train Your Scrappy Programmer series is a tour through the solutions to five days of Advent of Code problems from 2023. Since Advent of Code often involves the optimization of search problems, my series covers a lot of exactly what you asked for. It generally begins with inefficient solutions, examines why they are slow, and adds on optimizations to demonstrate improvements.
I hope you’ll consider checking it out and, if you do, please let me know if it did deliver what you are searching for.
I gave Erlang in Anger a skim but it looks like they only give it a surface level treatment. He shares the tooling you can use, but does not go into techniques.
Perhaps I need to dive in and pick something in our app to optimize and learn as I go.
Is it right to think that benchmarking can be a waste of time vs. actually running a production workload and inspecting?
What does your workflow tend to be? You produce a flame graph, identify the hot spots, then try rewriting those to be faster?
If you are solving business problems with your product, I would say in most cases yeah.
I interviewed a few years back for a company that was claiming to run benchmark tests on one of their telecom products written in golang, claiming it made their clients happy. Then afterwards they said that they usually have incidents and you need to carry the work laptop with you, I guess it’s all about setting the right priorities and raw performance is never in the top ones for such products. I obviously declined that offer after hearing that, as that is a simple example of a dysfunctional product.
It’s all about understanding and measuring correctly the performance of parts of your system. The easiest way these days is to send telemetry from the parts of the application that interests you. The tricky thing is that there is no rule of thumb on how to do it optimally, so depending on how you implement it, that will determine the quality of insights you will get.
I want to enforce the same point as @dimitarvp above. If your processing is already as concurrent as it can be, trying to optimize single-threaded raw speed is a fool’s errand in most cases.
If that is critical for the product, then the most optimal way is to offload it to a low-level language like c/rust/golang, otherwise you might get into the situation where you have a horrible codebase that nobody can understand and maintain, while providing some mediocre optimizations.
To be honest, there is not a specific performance goal we have in mind.
What we learned is that, in our industry, nobody is explicitly asking for performance. Most of the products are so slow, there is a bit of learned helplessness. They just accept products are slow.
But we spent a few days optimizing the frontend rendering performance of one of the main pages in our app and we got a lot of very happy customers. Nobody asked for it–but we believe they didn’t ask because they didn’t think fast software was possible
Now we are thinking to invest some percentage of our time making features they use heavily run faster.
It does seem, though, the first step to any of this is improving the skill of diagnosing why something is slow. Otherwise, as you suggest, I risk going off in a rabbit hole and complicating code for no reason. Or optimizing the wrong thing entirely.
As always … depends on case. Most people don’t count each 1s loss even if they would coun all software “lags” then they would have a very huge amount of lost time. So it’s usually about if the product is ready for it (not you or your team).
Again usually people want to have things that “just works” and therefore it’s not wrong to think that MVP does not really need optimisations. That’s said … it also depends on the speed of current software and how the project would scale. In first case too slow speed may be considered a bug and in second one it may be just a matter of time when you would run into possible scale problems.
In terms of learning I recommend this way of improving yourself (and therefore your software):
Train your instinct - a good one is worth much more than experience (for sure experience does not lose it’s value - it’s just how much instinct is worth)
Practice - no matter on how small or big projects - you just gain experience
Feed your instinct with your experience - the instinct is usually unclear and to “decrypt” it properly you have to learn, improve skills and so on …
Go back to projects you worked at least 6-12 months ago - you should be able to tell what part could be improved by just seeing the code (i.e. without analysing it)
If you have completed this list I have one good and one bad news for you:
Good: Optimising code should be much easier if not just easy
Bad: People learn the whole life, go back to point number 1
When to rewrite project? Regardless of what other people say things are going forward. If you would not update the code relatively often you would need to rewrite or drop it. It’s good to be up-to-date with most popular hex packages like phoenix. Update to next version is usually not really painful.
That’s said you improve, sometimes less and sometimes more, so no matter how up-to-date your code is sometimes you have to rewrite at least one feature. I would suggest to not doing this more often than once per year.
Regardless of how much you improve per month it’s still just a month. If you would rewrite even one feature every month then it would make a huge impact on the overall project. On the other side there is always at least one time in each year you usually have more time or just feel better than usually. That “flow” can be used in various ways and I believe that an yearly code review when you have the most strength is just a pleasure.
I think the biggest reasons why Elixir apps that are slow are the following:
No one bothered to capture the non-functional requirements from the client. Therefore there’s no definition of acceptable performance until it’s so bad people complain. And then it’s often too late. If you define these up front and actually create scripts to measure these values AS YOU DEVELOP THEM you will identify and fix perf & scalability issues before they become a problem.
A corollary of #1, people didn’t capture the non-functional requirements is because the “initial prototype” often gets upgraded to be the production system. All smart dev groups plan on building it twice. First to (presumably quickly) understand the problem and identify a viable solution path. Second to do it right and make that viable solution sustainable. Alas, more often than not the second effort is never allowed leading to the issues identified in #1. Of course if you’re doing a prototype you shouldn’t focus on premature optimizations. But make sure you’re not kidding yourself about getting that chance to “do it right”.
In Elixir-land (inherited from Ruby/Rails land) - making every user interaction hit a database. This is why I try to teach my team Erlang and the actor model before I let them jump into Elixir. Most user interactions for most applications have no business hitting a central point of failure and performance bottleneck that we call a database. 3 tiered architectures are so 1990’s yet they keep coming back over and over. Learn about the wonders of the BEAM concurrency model and OTP behaviors and use them! Databases should be small, specialized, and hidden behind the single actor that needs to manage that persistent state at scale. Everything else should be in in-memory actor processes already in state structured to address your user needs.
Trying to do something that really requires full use of the CPU resources in an interpreted language. This is when you need to invoke NIFs and send out to code/apps written in languages like C++. Check and make sure you haven’t just created a naive algorithm that is easily fixed by a more appropriate design. Breaking the problem down into a distributed/parallel solution is pretty easy in Elixir/BEAM and can achieve near linear scalability across multiple CPUs if applicable. If you’ve exhausted this option then you may need to look outside of BEAM-based solutions.
So measurement-wise, you’re indeed correct, it is a dark art. We’ve developed lots of custom tools/scripts to help us do this. We do this at such scales that we have recurring clients who annually hire us to perform large scale capacity and performance tests which we inevitable identify major issues and help them mitigate. For web-based stuff we really like Tsung but it’s really hard to use against LiveView apps so we’re working on options for that. Otherwise I’d say most our tooling and effort is custom for the apps and not very complex. We do a LOT of scripts in Erlang and Python and sometimes Typescript supported by other tools like Varnish and various event reporting tools. It’s definitely an art and, for any complex system we hit, the critical path is never the thing we thought it would be initially. Fun stuff actually. Hard to make a living doing it, alas.
I would turn your ask for having a slow app being gradually reworked to a faster one on its head – IMO you should tell us which slow parts of the code you accelerated and how. Because there’s too much generic advice out there (here on this forum included) that would not be useful to you since it’s so outside your context.
F.ex. many people reach for DB caching but that’s often misguided and a band-aid. Avoiding N+1 queries was actually mind-blowing to some teams I consulted for years ago. “Solution”? Russian doll caching and so many bugs that Salesforce could meet their quarterly profits target fixing only that project.
Another example: waiting, synchronously, for a 3rd party API network call to finish, inside a request-response Phoenix process. And then “Elixir is slow, let’s go back to Python!”.
3rd example: replace a semi-complex transactional operation with multi-step Oban job workflow. This is actually quite a good pattern very often, mind you, but in 3 separate projects I consulted for they basically misapplied it severely. All they had to do was have an Ecto.Multi chain, really. I helped them do that, their average workflow execution dropped from 40 seconds to 2-5. Oban is fantastic but you don’t reach for it when you need a single task done synchronously. That team’s old code was scheduling Oban jobs and was synchronously awaiting their finish. Yep, you read that right.
Right tool for the job and all that.
To circle back to my main point: it would be beneficial to the community if you helped us understand what was actually slow and how did you fix it. Then, as I said in my older comment, we can give you a more concrete advice.
Exactly that. Without knowing what your hotspots are, you are flying blind. You may start optimise some part of the code, that indeed is slow, but it overall impact is negligible, because, for example, it is called only once. On the other hand there may be small and fast function, that can be made faster, that is called all the time, and that small improvement may have tremendous overall impact. For example in our case changing metrics gathering library to Peep gave us like an order of magnitude perf boost (I would need to check how much it was exactly, but for sure results were mindblowing).
Michal Muskala who built Jason and the new Erlang :json gave a talk about a lot of how he made it fast. It was given at Code BEAM Berlin last year and should show up online in January or February I believe.
Man, I feel like your third point was written by me. I’m sick of seeing projects where every user interaction hits the database. All the projects I worked on in the last three years (I wasn’t doing Elixir before), have been like this. Database queries while “iterating” is something completely normal. Ecto schemas bubbling up to the views - completely normal. Dependency inversion, what’s that? Unit tests, not integration tests (because everything we do hits the database), what’s that? Okay, I’m going on a tangent and a rant here.
There are already some very valuable tips and I’m going to add only one. Separate pure from impure code. Pure code receives data and processes it. One part of the returned value can be a representation of the impure actions that need to be performed. You have to perform a few DB inserts? You might be able to batch them. You might need some set of actions to be “all or nothing”? Put them in a transaction. You have complete freedom to organise your impure actions however you like. As a side effect, pun intended, your pure code is super easy to test,