Have you moved away from elixir? If so, why?

My experience was incredibly similar, started learning Elixir a handful of years ago, then got into some funemployment (with freelancing sidework) after moving to Asia for romantic reasons and took the opportunity to get really serious about Elixir. I also joined a startup over here that did not fare well, then joined a budding organization, made a small non-essential web app with Elixir on Phoenix, and now I lead a team building a more complex web system with Elixir / Phoenix.

Was easy to make the argument for doing a project in Elixir because the other teams here are primarily PHP (lotta PHP in this city…), and Elixir/Phoenix have really matured in the past couple of years IMO

Haven’t moved away from Elixir and likely won’t anytime soon, it is an amazing ecosystem for building resilient, flexible, fast APIs. I don’t miss Node/Ruby/PHP et al. at all

5 Likes

During the last days I was playing around with HTTP/2 and SSE. Implementing this with Raxx is very easy and due to its streaming-oriented nature. I also implemented this with Phoenix/Plug but it really felt kinda hackish. Phoenix does an excellent job as a tool for building MVC based web-sites thanks to its batteries included. Raxx, on the other hand, is a nice solution when it comes to server streaming or micro services.

6 Likes

Just wanted to add my thoughts to this topic, too. First, I would like to state the I really, really enjoy Elixir as a language. Doing software development for more than two decades (god, I feel old) Elixir gave me several new ideads on how to tackle things - especially on the functional programming part. Also I find it quite amazing what you can do with OTP and distributed Elixir/Erlang. If only I had known earlier … :slight_smile:

But as an employee I believe it will be very hard to sell Elixir/Erlang to our customers or even our manager. Here in Germany Java/C# is very popular for traditional enterprises. Startups on the other hand seem to stick to node.js or more lately Golang.

And with the ever increasing popularity of the cloud, especially Kubernetes, Golang seems to become even more popular. Some even say that Golang will the become the lingua franca for cloud computing. Low memory footprint, easy-to-learn, high concurrency and, well, Google.

Maybe it is a bit of a marketing issue, too. Arguments like OTP and fault tolerance doesn help when the CTO states that this is handled by Kubernetes already. So no need for OTP. And while personally I do like the Phoenix package it’s hard to sell it in the ever increasing world of micro services. Every software architect I spoke to told me that without micro services you cannot scale your app. “That’s common knowledge”.

4 Likes

Yeah, the software industry in Germany is really frustrating and so much behind other countries.

5 Likes

Good idea. I guess this forum would be an excellent place for that. I’ll check if anyone has opened that thread already. (I might do this later)

3 Likes

Arguments like OTP and fault tolerance doesn help when the CTO states that this is handled by Kubernetes already.

I haven’t worked with Kubernetes a lot - but wouldnt it just handle fault-tolerance on the scale of an entire app crashing? Kubernetes + golang (without external golang packages) wouldnt handle fault-tolerance on the “agent” level. Which is where the niceties of OTP and its standards come to play.

3 Likes

I also had a hard time convincing companies to do Elixir (as a freelancer in Denmark).
It came to the point that my customers would require me to buy ice cream to them every time I mentioned Elixir.

I ended up starting my own company, become CTO and choose Elixir as the main language :smiley: Now we are 10 people (3 developers) and I managed to hire the first Elixir developer besides my self.

Perhaps not a solution for anybody, since it requires the right place, time and people. But if the right place, time and people are there, go for it! :muscle::smile:

39 Likes

Yep, asking for explication of what ideas there actually are outside the current libs that offer value, gets interpreted as claiming any such ideas are worthless. You’ve been crystal clear that you (we) are open to hearing about them–

While this is true, the trend is for the apps in K8s to be smaller and smaller. As in it’s not uncommon to spec 1 CPU & 256MB for Java apps. Of course not everything fits that model, but… K8s pods look more & more like OTP supervisors with a handful of child processes.

3 Likes

If somebody is only familiar with mainstream technology they would never expect the notion of fault resilience (rather than bare bones micro-level fault handling) to be part of a programming language.

Most high level programming languages are designed “language first” to perform computations. I suspect Erlang and the BEAM were developed in lockstep for computation and coordination which is why runtime concepts like process links and monitors are part of the core.

Designing for Scalability with Erlang/OTP p.175:

This forms part of the supervision strategy of a system and in some situations is put in place not by the developer, who focuses only on what particular workers have to do, but by the architect, who has an overall view and understanding of the system and how the different components interact with each other.

There also seems to be a drive to commoditize developer skills and designing for fault resilience seems to require a more advanced skill set. So the responsibility is pushed to the next level up, outside of the container. At this level of granularity the container becomes the “component”, k8s adopts the “let it crash” philosophy, creating design pressures to have the “component” no larger than what you can afford to lose.

At least that is my current perception …

10 Likes

This seems to be the crux of this divide we’re lately witnessing (K8s vs. other deployment strategies). Many CTOs and programmers in general want to treat fault tolerance as an implementation detail that can be offloaded onto the Operations / Sysadmins team. I certainly witnessed such efforts several times already.

I mean OK, they are free to try but I have the feeling that the IT area at large will yet again waste humongous amounts of time, spend a lot of money, generate huge legacy somebody has to maintain, burn out and seek alternatives… and then somebody will notice that Erlang had 99% of what they wanted all along. :003:

11 Likes

Maybe it is miscommunication. People hear “restart the process” and view it as a higher layer and similar to k8s. But restarting a crashed process isn’t new and Erlang itself even comes with heart but no one has questioned what the point of supervision is if you have heart.

I think this goes beyond simply explaining the performance/cost benefits of not churning k8s pods. It could need an explanation that breaks down how it helps structuring your program and how you would no more want to be without it than you would try/catch.

3 Likes

I usually turn this around by saying supervisors work as if you had k8s inside your code. For example, when you have 20 database connections in your app, it is not something that k8s can manage, restart and automate, but supervisors provide exactly that, so you can apply the same principles on the “small” and the “large”.

Of course there are many gaps in this explanation but it can be a starting point for someone who heard very little about Erlang/Elixir.

21 Likes

I haven’t moved away from Elixir but I have moved away from dev into a more Ops/Arch:Managerial level…largely because I’m very passionate about protecting the development process from the business side of a company. As a developer, you can’t easily do that because there’s a lot more communication involved.

But there’s also the reality that after diving headfirst into Elixir I find myself having to avert my eyes to a lot of other languages. I find it’s much simpler for me to operate at a higher level and discuss the needs with devs of any language to solve in the way that makes the most sense to them…than it is for me to look at the code and constantly think about how much simpler this would be with Elixir.

6 Likes

This is where things like load-balancers - which are external to your application by definition - pose interesting questions for us.

K8s, PCF, or other high-level orchestrator, can stop sending a node traffic when it displays signs of issues - latency spikes, cpu spikes, request-rate processing drop off, 5xx status response-errors, and then create a new node.

When an Elixir apps is having problems, it can either continue to accept connections that it can’t currently handle in the hope of restoring normal service, or it can start to reject them, at which point some other component is going to have to handle this anyway.

The external orchestrator stops the bleeding as soon as a problem is detected. The internal handling of a supervisor, doesn’t as it attempts to recover from the problem. External orchestrators have a smaller range of errors that they can detect than internal supervisor hierarchies, and a smaller range of responses. Higher error sensitivity is good, a small and deterministic range of responses is also probably good.

Combining the two is, probably, the worst of all situations; your app accepts traffic it can’t currently handle as it tries to recover, delaying the point where the external orchestrator gives up and kills it.

This is why I don’t use Elixir beyond personal projects, even though I’m very fond of it.

1 Like

Combining the two is the best of all worlds. Elixir and Erlang give you tools to provide signals to external load balancers and do much smarter load balancing and routing of traffic. You don’t not use a load balancer just because you’re using elixir.

At high scales you’re essentially running in degraded mode all the time whether you know it or not. The goal is to make all of your applications gracefully degrade. Every language and runtime has to contend with this. Elixir and erlang provide a useful set of tools to build that graceful degradation into your application which is a really powerful concept. But it doesn’t preclude or obfuscate the need to also provide graceful degradation at the system level (like in your load balancer example). It complements it.

11 Likes

A supervisor won’t retry things forever. Continuous errors will cascade and cause the whole node to go down permanently, which is when the orchestrator will detect it. The advantage of the supervisor is that if you have a small hiccup, you can quickly heal without going through an external system loop.

Furthermore, you usually write your supervision tree so the essential services are started first and the rest of the application won’t run unless those services are available. For instance, if you are detecting failures through a health check endpoint, it is straight-forward to either disable the health check endpoint if any other service is down, or have the health check report that the system is non functional.

So if your system is accepting requests when you know it can’t handle them, you have all of the control to stop accepting said requests. The goal of a supervisor is to help you think about failures and setup reasonable strategies. In the worst case scenario, if your strategy is never handle failures at that level, you might as well set max_restarts: 0 .

14 Likes

Sometimes the better course of action is just to listen: “hey, I am sorry that you feel this way. I have not had the same experience. In any case, if you want to talk or suggest ways we can improve, we will be glad to hear”. That’s it.

I completely understand your perspective. It is frustrating to be called something you don’t agree with, but at the same time, this is a thread for people to express their opinions on why they moved away from Elixir. Of course, if you disagree with something, you can provide counter examples, but this thread should mostly be an exercise on listening for most of us. Otherwise, the next time a similar thread pops up, nobody will say anything, and then we won’t learn anything either.

There are countless threads that say wonderful things about Elixir, we will be fine with a thread that brings some of negative points to light so we can work on them.

50 Likes

Agreed. Apologies if what was said looked non-friendly.

12 Likes

I completely agree with your comments about systems at scale operating in some manner of perpetually degraded mode; continuous partial failure is a fact of life.

But those failures are, by their nature, likely to be surprising. When an application/node is having problems, you can’t rely on it to solve those problems, and you can’t rely on it to reliably inform other systems of those problems. The only way I know of to do that is by external observation of the behavior of a node. Then it doesn’t matter if a node thinks it is healthy or thinks it has issues, has managed to communicate that state to an external system or not… if it isn’t observably behaving as expected, you can take action - in this example, take it out of a load-balancer rotation.

In a homogeneous Beam environment, I might want to try to overcome that, but I’ve never worked in a homogenous environment at scale. I can externalize that control with PCF or K8s/Isto and my problems largely go away, no matter what the implementation technology is. At scale, being aggressive about killing nodes gives me better liveness. If 1 out of N nodes starts to look odd, immediately taking it out of rotation, creating a new one and terminating that node when any connections have returned or timed out, gives me a better return than watching and hoping that it will get better.

All this does presuppose ephemerality of nodes, something that has become increasingly true, but is not universally so and different trade-offs need to be made. But, in applications where nodes are truly ephemeral, I get no benefits from nursing sick nodes back to health.

1 Like

A few thoughts:

At the time when Beam was invented, there were large machines with applications composed of many subsystems. Supervision of those subsystems within the application combined with very detailed work to stop failures propagating across processes in the VM was a work of insight and genius.

Now, in the world of microservices and, for the first time ever, shrinking node-sizes we are composing those subsystems at the node per process level rather than as internal modules and processes. To get reliability, we need a supervision mechanism just like the one the Beam has had for all these years. That’s the external orchestrator, and letting individual subsystems/processes/nodes crash is the epitome of the Erlang model.

Now, the Beam has supervisor hierarchies, and you could suggest that we think of an external orchestrator as one level, and the internal supervisor hierarchy as simply another finer grained level. And that is worth thinking about. I would however suggest that, just as the Beam doesn’t really try to save processes - it lets them crash and creates a new one, that containerized microservices are the process level abstraction in this case and that we should let them crash and create new ones.

This isn’t a black and white issue, but as container management orchestrators get better and better, I can’t make a convincing case for why I’d try to save a single node/process.