Scalability and fault-tolerance - Elixir vs Java

ahmadferdous · November 19, 2020, 5:11am

Hi folks,

I’m an Elixir newbie trying to understand how Elixir (Phoenix) compares with Java in terms of scalability and fault-tolerance for a typical web application backend (business logic + DB).

Scalability:
Phoenix spawns a new lightweight BEAM process when it receives a new request. Because it uses lightweight process, it can spawn a high number of process to handle a lot of requests. Once we hit the scalability limit with a single machine, we add more machines to scale horizontally.
Java also has lightweight threads and by adding more machines, we can scale horizontally. Then what additional advantages does Elixir (Phoenix) have over Java?

Fault tolerance:
Elixir uses supervisor to monitor the child processes. If a child process crashes, supervisor will restart it.
Java has exception handling to catch errors gracefully. The Java app can be restarted with operational support in place.
How is Elixir (Phoenix) superior to Java in terms of fault tolerance?

Please help me understand Elixir and Phoenix better. Thanks.

dominicletz · November 19, 2020, 9:10am

Having worked with a big Java System (Cassandra) for many years I think there are pretty strong advantages on the Elixir/BEAM side of things. Especially for WevServices for which you want to ensure they always stay up:

Scalability:
1 Memory & GC)
Behind BEAM lightweight process model is also a superior scaling memory model. With BEAM you can easily scale to multigigabyte RAM machines and never have any garbage collection freezes. Now with Java on the other hand you will find yourself dealing with global garbage collection freezes at scale. If you scale your service and suddenly you get 1, 10 or even 100 second pauses you know you’ve got a problem. https://stackoverflow.com/questions/21992943/persistent-gc-issues-with-cassandra-long-app-pauses#22002767

For me I decided at some point that I’ve spent enough hours if my life dealing with Java gc pauses to leave that tech behind for good. Also don’t fall for their marketing Java every release claims that their garbage collector has been getting so much better, but the problem is that the concept of one large memory heap is just not scaling. The memory per process of beam just scales perfectly in comparison.

2 Team size)
Expect to scale a larger team with Java. Your code size will grow quicker factor 3x or more mostly because of boilerplate and types. Don’t underestimate the impact of this, development progress will be much slower over time with Java.

3 Ops)
I can’t put into word how much time I’ve saved debugging Elixir services because of the remote shell. In Java land you’ve got JMX and JConsole, but they can’t compete at all with the power of IEx and Observer. Being able to not only inspect all running GenServers, Supervisors, etc. but also to send them commands gives you superpowers and will allow you to manage much larger deployments much quicker. Try to hunt down the infamous null pointer exception in a running deployed Java application and then compare that to the ease of doing the same in Elixir.

Fault Tolerance:
This is a clear win for Elixir, partly because of the system and partly because of the architecture it enforces. Having each request in a worker means that when one worker crashes, everything it used is cleabed up and more importantly all other workers are unaffected.
Java more like C though have the problem that an unexpected error will lead to an unknown state of the whole system. This can range from just crashing the whole system to doing really bad things to all non-crashed components. Before you’ve found all the edge cases that need to be properly handled in a Java app you will turn gray. In Elixir on the other side, “letting it crash” is often totally reasonable and doesn’t affect your uptime. More importantly though this is the default - without any extra work. In Java the default is a burning desaster until you’ve added so many exception handlers that the normal code flow becomes hard to read. At the end of the day I’ve seen Java application to crash much more frequently than Elixir apps. https://speakerdeck.com/mkaszubowski/otp

To mention here I think are also live upgrades even if they’re relatively niche. With Elixir you can build systems the never fail. Tcp connections that are keept open for months even over software updates. The way you can upgrade statefull Elixir processes through OTP releases is unique. And if you really need fault tolerance to that level Elixir is probably your only option.

Best