Understanding Elixir/Phoenix performance

earth10 · December 2, 2018, 7:04pm

Hi, I’m just starting to build a side-project with Elixir and Phoenix and doing some basic test with Elixir alone.

What strikes me is that almost every task seems several times slower programmed in Elixir than with Python or Perl. Examples are: traversing a directory and read file modification times, read a CSV file line by line and do some basic processing with them etc.

Despite this, Elixir and Phoenix show excellent performances when compared to web frameworks written in other languages.

If I understand correctly, Elixir “worse” raw computational speed is more than balanced from its superiority in concurrency. Oversimplifying: Elixir can be 10 times slower than language X but if it’s 1000 times better in concurrency, it will shine for high traffic websites.

But then Go comes into play: with excellent raw speed and excellent concurrency too, it should outperform Elixir easily. Which according to my reading doesn’t happen: it may be faster but not by the large amount I would think.

Can somebody help me to understand how it is possible? I’m not asking for low-level explanations, only some pointer for further reading. (Maybe I should just be happy with the end result but I like to understand why things work in a given way )

Thanks!

idi527 · December 2, 2018, 7:07pm

What strikes me is that almost every task seems several times slower programmed in Elixir than with Python or Perl. Examples are: traversing a directory and read file modification times, read a CSV file line by line and do some basic processing with them etc.

If possible I’d first like to take a look at your elixir code. Sometimes there are ways to improve it a bit.

frigidcode · December 2, 2018, 7:22pm

Can you link to your code please?

earth10 · December 2, 2018, 8:26pm

My Elixir code is surely awful since I’m new to functional programming and I understand it’s not fair to make comparison with languages I have use since ages.

Two example of Elixir code i found to perform worse than Python and Perl equivalent are these (not mine):

But my question was more generic: I think that we can assume that Elixir is not born for raw speed and that languages like Python and Perl will usually have an advantage for simple non concurrent tasks.
If it’s not the case than my question is meaningless and can safely be deleted.

frigidcode · December 2, 2018, 8:57pm

I think your original statement that raw speed isn’t Elixir/Erlang’s main advantage is correct. Areas where raw computational power is required you can use a language like Rust and write a NIF so you can utilize it from the BEAM.

I would agree with that, also add in process isolation / supervisors / etc.

kokolegorille · December 2, 2018, 9:27pm

Using Enum module is eager, meaning everything is loaded into memory, it would be better to use Stream.

Meanwhile I do not understand why not use Path.wildcard()

iex> Path.wildcard("./**")
["README.md", "_build", "_build/dev", "_build/dev/lib",
 "_build/dev/lib/chess_db", "_build/dev/lib/chess_db/consolidated",
 "_build/dev/lib/chess_db/consolidated/Elixir.Collectable.beam",
...]

# lots of dir and files

And then, if You want just files…

iex> Path.wildcard("./**") |> Enum.reject(&File.dir?/1)

# or

iex> Path.wildcard("./**") |> Stream.reject(&File.dir?/1) |> Enum.take(5)

cmkarlsson · December 2, 2018, 11:03pm

Yes, but please also not that it is generally faster than perl/python/ruby even for computational tasks. It is slow in comparison to C/Java/Go/Rust type of languages. The “erlang/elixir is slow” quote is thrown around so much that people have started believing it is slow in comparison to any language which is not true.

I don’t believe this is true. If you look at https://benchmarksgame-team.pages.debian.net/benchmarksgame/which-programs-are-fast.html for example erlang comes in somewhere in betweeen and faster than perl and python. And these are tasks which are very unsuitable to do in erlang/elixir.

Can it be slower than perl and python for specific tasks? Of course. The task may be much easier to implement in a mutable language, it may rely on highly optimized underlying code or it is actually done in C.

On the other hand: If you have a problem domain which fits erlang/elixir then it will be fast. That is also the reason go doesn’t have more of an advantage. The computational strength of the language is not as important as its concurrent primitives and handling with IO. And even if go is generally faster when it comes to handling things concurrently the gaps narrows because the “speed” of goroutines vs processes and the underlying scheduling even things out.

michalmuskala · December 3, 2018, 1:52pm

Web servers are fast in Elixir because web servers don’t do anything most of the time - most of the time they are just waiting. Either for request data or for database, etc. Elixir/Erlang are excellent at finding things to do when one of the processes doesn’t do anything, which makes them generally fast at web servers.

There’s also a question of algorithms. If you use an algorithm designed with mutable data structures in mind, it will be unavoidably slower when used with immutable data structures - on the other hand, there are some algorithms designed for immutable data structures and different, more specialised structures that can shine in some cases.

Finally, there’s the matter of the VM. BEAM is just a very well implemented and a very efficient machine. The runtime system responsible for IO interaction, scheduling and similar things have been optimised over the years. Yes, it does not have a JIT, but the normal emulator is quite fast compared to other VMs. It’s also one of the few register-based VMs in the wide usage, and register VMs generally tend to be faster than the more popular and simpler stack-based VMs.

earth10 · December 3, 2018, 4:29pm

Thank you everybody!

It seems that while I was right in thinking that features like concurrency are much more important than raw-speed in typical Elixir use-cases, I vastly underestimated the importance of code optimization.

I will try, as an exercise, to rewrite some of the algorithms like the directory traversal example above to make them more efficient (even if in a real word application this would probably be useless).

sribe · December 3, 2018, 8:55pm

I suspect that quote is in reaction to all the “Elixir is lightning fast” quotes, which of course are only true when comparing to slow interpreted languages, Ruby in particular. It’s not fast when compared to compiled C++/Rust/Go and Java (well, mostly). But to have the expressiveness of Ruby plus some, at a performance level solidly between Ruby & C, is an awesome win.

earth10 · December 6, 2018, 3:34pm

Ok, I tried your suggestions for code optimization and other approaches to directory traversal; the fastest way to recursively walk a directory and print file names (I gave up to printing file modify dates to keep things simple) was:

defp walk(dir) do
  Enum.each(File.ls!(dir), fn file ->
    IO.puts fname = "#{dir}/#{file}"
    if File.dir?(fname), do: walk(fname)
  end)
end

walk("/path/to/dir")

I compiled it in an executable with escript and redirected output to avoid measuring terminal speed! On a directory tree with 62.000 files it takes a time variable between 8.81 and 11.17 seconds.

The same task with python required 1.10 -> 1.20 seconds.

For comparison, a shell script with find and xargs took 0.73 -> 0.76 seconds

I’m wondering if I’m completely missing something obvious or if my platform, FreeBSD, is the problem. I tendo to exclude the latter since I’ve never heard about issues with the Erlang/FreeBSD combination.

earth10 · December 6, 2018, 5:32pm

Is this true even for “old style” web development without persistent connections from clients?

I see Elixir as a perfect fit for modern web sites with soft real time features (real time notifications, automatic field completion with server side intervention, chat etc.) since you have a lots of clients with a persistent connection, each of them requiring few work on the server.

But let’s consider “old style” sites where client is just served a dynamic page, cached when possible to reduce database access. And we place it behind a reverse proxy which buffers communications from and to clients (so our Elixir/Phoenix server only communicates with the local proxy, and is not impacted by slow clients). Is this still a problem domain which fits Elixir or a more “raw-speed” approach should be explored?

This in mostly out of curiosity and for better understanding: I’m moving to Elixir/Phoenix because I find them very well thought, well documented and robust, not for performances (which of course is a nice plus but not that important to me).

OvermindDL1 · December 6, 2018, 7:56pm

Even for old style it is still useful. It handles load very well, most systems will crumble under load, plus its scaling capabilities.

garazdawi · December 7, 2018, 8:56am

Try to exchange File.ls!(dir) with elem(:prim_file.list_dir(dir), 1) and see what difference that makes.

Edit: Looking at the code again it is most likely the File.dir? and IO.puts that takes the majority of the time. You can do the same trick with File.dir? although it is a bit more convoluted as no equivalent function exists in Erlang.

josevalim · December 7, 2018, 10:06am

Doing file traversals is generally not going to be as efficient in Elixir/Erlang as in other languages. I will explain why.

When you call File.open/2 in Elixir, it doesn’t return a file handler. It returns a process (a lightweight thread of execution) that contains the file handler. But the file handler itself is not even a direct file handler, as you would get in C, but it is an instance of a linkedin driver, which is a piece of code that runs isolated in the VM, that then talks to the file handler.

You may be wondering: why all of this indirection then?

The reason why File.open/2 returns a process is because we can then pass this process around nodes and do file writes across nodes. So for example, I can open up a file on node A, pass that reference to node B, and node B can read/write to that file as if it was in node B, but everything is actually happening in node A. So the reason why we do this is because we favor distribution over raw performance.

What about the linked driver thing though? There are two reasons. First of all, let’s remember that those kind of operations need to be implemented in C or a low-level language for syscalls. And while Erlang provides interoperability with C code, in earlier versions, it was not possible to do an I/O based operation from within the C code. If you did that, you could mess up with the Erlang schedulers that are responsible for concurrency. The second reason is that, if you have C code and there is a bug in that C code, then it can cause a segmentation fault and bring the whole system down, so we prefer to keep our systems running. That led the code to be put in those linked drivers.

Of course all of this adds overhead but the reason we are fine with it is because for our use cases it is most likely that you will find yourself passing a file between nodes than traversing directories as fast as possible, so we focus on the former.

The situation has improved in the latest Erlang/OTP 21 release because the VM added the ability to run I/O blocking C code with something called dirty NIFs, so they recently removed the linked drivers for file operations and that improved performance. But still, most calls in the File module is going through processes and what not. You can actually bypass this process architecture, usually by invoking the :prim_file module or passing a [:raw] option to the File module operations and that typically improves things.

But in a nutshell that’s why it won’t be as fast, because there are many cases where we prefer to focus on features such as distribution and fault tolerance than raw performance.

Btw, regarding CSV processing, did you try the nimble_csv library?

josevalim · December 7, 2018, 10:21am

Just as an example, if I rewrite your code to avoid calling FIle.dir? multiple times and instead rely on pattern matching:

  def walk(dir) do
    with {:ok, dirs} <- File.ls(dir) do
      Enum.each(dirs, fn file ->
        IO.puts fname = "#{dir}/#{file}"
        walk(fname)
      end)
    end
  end

Then it is about 40% faster on my test sample. And if I use :prim_file instead of File so we skip the process and the atomicity guarantees:

  def walk(dir) do
    with {:ok, dirs} <- :prim_file.list_dir(dir) do
      Enum.each(dirs, fn file ->
        IO.puts fname = "#{dir}/#{file}"
        walk(fname)
      end)
    end
  end

then it is roughly twice faster.

EDIT: Actually, I measured those times using the OS time utility, so that includes the time to boot the VM which is roughly 0.170s in my case. So the gains are a more than 50% once we remove the constant factor.

earth10 · December 7, 2018, 3:45pm

Thank you all for your suggestions, I just tried them and each resulted in an improvement. At the end the code snippet was three times faster than my first approach.

I’d say my main error was considering it an “easy task” in different languages without realizing that there are no easy tasks when everything is ready to run across different nodes. So it was a comparison between something very simple in Python and something quite complex In Elixir. Definitly not comparable!

Yes I used that library but I was also doing other things which, as resulted from this discussion, weren’t trivial as I thought (like listing files in a directory to choose the CSV to read etc.) I’ll do other tests but I’m pretty sure I was doing the same mistakes of the directory traversal example.

Tank you for the detailed explanation of inner working!

josevalim · December 8, 2018, 7:19am

The nice thing is that, if you are attempting to parse multiple CSVs, then that’s a problem you can change to leverage concurrency in a relatively straight-forward fashion, so maybe we can even run faster than the other languages once that is taken into account.

dch · December 8, 2018, 8:03pm

An aside, FreeBSD and BEAM are a great combo in particles dtrace support is excellent. I’m happy to answer any questions there if you need help.

Also your escript probably isn’t really a compiled task; try putting it into a module, compiling that, and timing the execution of the module+function from a running vm. Not only is this a more typical scenario, you can start comparing running 1000 parallel runs vs that of python. It’s going to be very clear that a forked worker uses 100x the memory vs the Elixir one, and with better response times.

An artificial benchmark may not give you practical comparisons vs real world running code. But trying to understand the difference can be very instructive.

Finally you may not realise but this thread has the creator of the language, a core contributor to the VM, and people with a decade of production erlang replying. Getting this level of expertise on a random topic is not unusual on the erlang world. We are very lucky.

PS post your escript and let’s see what we can do with it.

josevalim · December 8, 2018, 9:51pm

Elixir escripts are compiled though. It is a zip file with .beam modules in there and a couple other things.