Perhaps my question wasn’t clear enough, so I’ll try to be more precise. One million users are connected to the system. How many nodes would you run? Is it closer to one million, or is it closer to one?
That’s far too vague a question to possibly answer; the number of connections is largely meaningless by itself, what those connections are doing is important. The characteristics of a ping-server with 1 million connections is completely different from a system performing 1 million ultra-high-resolution ray-tracing operations simultaneously.
However, I can say this: I would favor having as many small nodes as was practical, not as few large nodes.
How many nodes would you have? Some scenarios to consider:
- If your load drops from 1 million connections Monday to Friday, 9 to 5, to 1 thousand for the other hours on those days, how many nodes would you have for the rest of those days?
- When your systems have no connections at the weekend, how many nodes?
- If, on one weekend a year, your load jumps - just for that weekend - to 100 million connections, how many nodes would you have for the rest of the year?
- If for first fifteen minutes of that weekend, your load, at the stroke of midnight - just for those 900 seconds out of a whole year - jumps to 200 million connections, how many nodes would you have?
These are just bulk scenarios, and whilst all representative of reality, I’ve never had to deal with precise and consistent numbers of users; that number always fluctuates, often predictably, sometimes not.
With fine-grained nodes, I can provision exactly the nodes I need to handle the traffic I have at that point in time. The bigger my nodes, the less precise I can be and the more resources I have to pay for, but not use.
Here’s a more specific question. Let’s say you have a web service that accepts gets, fetches data from a database and returns that data. You idle at 1k requests per second. Assume the queries are straightforward and optimized. For each of your service nodes how many database connections do you establish? 1 or more then one? If a network connection drops due to transient failure what do you do? Let’s say your traffic can spike without warning between 10x and 30x within a 30 second window. How do you ensure availability of your service. Availability meaning you’re continuing to return data successfully underneath 500ms which is your 95th percentile latency SLO?
I have a feeling we’re talking past each other. Given your constant explanations about elasticity, it seems that you think I’m not familiar with the idea, or I think the idea is bad. So for the record, I believe that it’s a good idea, though, as usual, there are always some fine points. But I’m not going into that right now.
The discussion I’m having here is based on your earlier statement:
In particular, you seem to think that approaches like k8s or FaaS can substitute Erlang fault-tolerance model. Which brings me to this statement:
For this particular discussion the ratio of nodes to connections is important. If you manage more than one connection at the same time on any node, you’re dealing with multiple activities on that node. This is because my connection and your connection are typically unrelated, and failure, latency, or busyness of one should not affect other.
If you’re aggressively killing such nodes, you might end up in a situation where a single rotten activity will cause the termination of many other, perfectly healthy activities. If the number of connections you’re managing per single node is large, for example b/c connections are I/O bound and you don’t need a lot of CPU, you might end up taking down a significant healthy part of your system due to just a few isolated problems.
Going beyond, you could end up in endless restart loops. Suppose that somehow my input triggers a bug which blocks the OS thread, and thus the node becomes unresponsive. You take down the entire node, and a bunch of users get disconnected. Now, you spin up a new, heathy node, and the users all reconnect. As one of those users, I provide the same input, and we end up in the same situation, going into the vicious cycle of restarts. The system will basically not work. Just in case you think this is a theoretical discussion, rest assured that I’ve seen it in production
To keep the system going and provide as much of the service as possible, in spite of such failures, you need to be able to separate individual activities (in this example, connections). Failure of one shouldn’t affect outcome of others. Latency of one shouldn’t affect running time of others. If some activity ends up in an infinite CPU bound loop, other activities should still work properly and their durations should not be significantly affected.
BEAM and its languages give you these properties, because they provide their guarantees at a much more granular level. That’s why I believe that BEAM is still very relevant, despite the advances of elastic technologies. Elasticity is good, and desired in many systems, but it’s not a substitute for fault-tolerance. Elasticity is about using as much resources as we need given the load which varies over time. Fault-tolerance is about, well, tolerating faults These are two different problems.
Now, alternatively, you could use one node per each activity, but then you basically have to manage each connection on a separate node (that’s why I asked about the number of nodes). In the theoretical example I mentioned, this means running at least 1,000,000 nodes for a million users (probably more, because connection is not the only activity in the system). I haven’t been following prices so much, but last time I checked, this was actually much more expensive than constantly running some pre-reserved amount of instances. You might be paying only as much as you use, but that still might be more expensive if you’re paying a lot per unit (see this tweet and the accompanying blog for example).
So in summary, elasticity is not fault-tolerance. You can simulate fault-tolerance to some extent with elasticity, but in my view it’s too coarse grained. If you want to build reliable, highly available systems/service, you need to be much more granular. BEAM languages are one of the few languages which give you that kind of granularity, which is why I believe they are still very relevant, regardless of the existence of k8s and FaaS. That said, I also believe that elasticity is a very nice property, and I’m certainly not suggesting that BEAM is the solution for that problem.
I’ve had a client move from Elixir to Rails for a new project because it was hard to find people to work with Elixir.
It’s a great platform and language but hiring hurts.
I still use Elixir for all pet projects, but I have also explored Crystal because it looks like Ruby/Elixir but it’s significantly faster.
This will be my approach this year. I have the advantage that I am not on the dev team, so I report to a VP who has given me autonomy in technical stack and architecture choices. I’m working on an internal yet highly-visible side project—i.e. not crucial to the business but everyone will see it (realtime dashboards).
The current one is in Node (which is also non grata here) but is not as reliable. Not because of Node, but because the MIS architecture is somewhat hacky from what I understand. Like cron jobs running ETL scripts and that’s it. And who is going to have time to build an ad-hoc, informally-specified, bug-ridden, slow implementation of half of OTP in Node?
The concern over hiring is real, and we have already been burnt by going with Lisp and Clojure in the past. The problem there is managerial and cultural.
Yes except, ironically, hiring devs for well-paved-road experience is also an issue as it makes it much harder to identify the good ones. But sure, leadership is free to hire an army of mediocre Java developers instead of a couple of good esoteric-language developers.
Well said. We hired Clojure developers under a Java-experienced dev manager who ended up reporting to a .NET-experienced technical but non-dev VP. And then they blamed Clojure.
Me neither, but I’m fairly new. Case in-point: was there any hostility in the Gleam thread? Quite the contrary I would think.
Then again, I should probably rewrite those last two sentences as they do seem a bit hostile due to their terseness and the rhetorical question coming across as sarcastic. Better would be to say “For example, I did not see hostility or zealous defense in the Gleam thread—even though static typing on the BEAM can be controversial and has probably been discussed ad nauseam in the past.”
I think that might be the issue here. We might like how that sounds but we can’t expect everyone to approach their projects with the rigor of a Ph. D. dissertation where all prior art has been thoroughly researched. But as Jose has said, it can get tiresome if it comes up over and over again.
And in terms of monoculture, it is perhaps true that there is an overcorrection here against the Node world where Yet Another Frobnicator Library is frowned upon if it is not clear how it moves the ecosystem forward.
Which, now that I am catching up on the thread, has been further discussed.
yup, hiring an “Elixir developer” is difficult, that’s why I would hire an experienced one, to mentor the others.
This book might help to be guided https://pragprog.com/book/tvmelixir/adopting-elixir on how to introduce Elixir in a company. (obviously when it fits the use case).
It’s really good.
Covers a lot of point of views about Elixir (hiring, success cases, deployment, etc)
As as side note, I remember more than 10 years ago how it was very hard to find python developers (and python jobs) and now it still surprises me (in a positive way) that it has become such a popular language.
I begun as a python developer at that time, of course because I was interested in the language and I had the freedom to deliver one small project in a tech of my choice.
That’s how it started my journey with this programming language.
Looks like Elixir in a similar position as Python many years ago (small projects introduced in startups/companies) as mentioned also by other folks.
EDIT: I forgot to say that I am developing professionally in Elixir since less than two years and I am really enjoying it
An alternative is to hire an Erlang programmer as most find it quite easy to migrate. If they want to.
Though they might not be easier to find than Elixir programmers.
Hiring good programmers is always an alternative, though you have to be prepared they might take a longer time before they start being productive.
My experience of working with Elixir has been really mixed. While I have really enjoyed coding in the language and working with talented and knowledgable developers, mainly on open-source projects, I have had a lot of negative experiences working in the industry and trying to get work.
The main problems I have had are putting up with rude and inconsiderate behavior, which unfortunately is all too common, and the fact that employers have no respect for open source work (when hiring - they are quite happy to use it ). Also, an Elixir-specific issue I have had is that so many jobs seem to be for Ruby developers who also know Elixir, so not knowing Ruby disqualifies me there.
When I was running Erlang in production we just hired for Ruby and cross trained as we found no Devs otherwise. Though Sydney does have a dev shortage generally so not surprising.
I’ve not moved away so much as stepped-back. I have a system I really want to migrate to Elixir which is pre-ES6 Nodejs and Meteor but because I need to implement some fairly significant changes sooner rather than later I am going to modernise the existing code before I attempt to migrate it all to Elixir. I don’t feel that my Elixir knowledge is strong enough to confidently port it yet and I’d also prefer to see what happens with LiveView and deployment in the next few months.
I an hoping that my Elixir development plans can start properly later this year but until then I’ll still be tinkering and learning.
Fans of this forum thread might also like this Reddit thread: https://www.reddit.com/r/elixir/comments/avni73/talk_me_out_of_elixir/
A big reason I’ve slowed down my Elixir use is because of the library designs which try to reduce compilation on code changes. Personally, I like the opposite: to make my compiler and tools do more work, not less. E.g., I’d like the compiler/linker to tell me if I misspelled a Phoenix route, but it can’t. The framework was designed to avoid the implementation issue of compilation speed at the expense of safety. Plus, I’ve heard that this “late linking” is desirable when you want to hot-swap updated modules into a running system.
But that’s a feature I don’t need. All of my Elixir (and /Phoenix) apps are traditional applications: I code and deploy. Then I repeat: make changes, then redeploy. In the case of a web app, my environment (like Heroku) handles smooth rollovers. In other words, I don’t need the extreme runtime flexibility of hot swapping code, yet I pay the “price” for it when using Elixir. I suspect that a lot of other general purpose developers are in the same boat. See, e.g., https://elixirforum.com/t/how-to-get-a-warning-on-undefined-module-use/16332
So I’m feeling like the systems aren’t a good cultural or technical fit for me.
This actually makes me sad! Because I appreciate so many other aspects of the technology and the community.
Here’s my thoughts of Elixir based on personal experiences. (Priority ordered)
1. Coding is exciting, really.
At first, Elixir syntax was a bit strange to learn. I really like Functional programming over OOP or JS style; not too much complicated and more intuitive. Pattern matching: useful, Immutability: useful, Process: handy. Syntax is easier than JS, Java. Nodes and clustering: don’t know.
- Backend all-in-one.
I have a doubt the current service styles of cloud computing IaaS. They’re getting more complex and segregated. Elixir, on the other hand, can make one packed backend. Trends in backend is thin and separated like lego blocks. Elixir isn’t a good fit for that structures I think. But they’ll be merged at last. Separated and connected structures make another complication problem.
1. Poor IDE.
This is the frustrating weakness I think. I’ve been using Atom on client and cloud IDE for Elixir. On the other hand, Android Studio makes me so happy to programming. Java, actually Kotlin I’m using, is complicated unnecessarily than Elixir. But the IDE makes it to use them quite comfortable.
If the Elixir can provide an advanced IDE, then I’m pretty sure the popularity will be increased.
2. Performance isn’t proven.
There’s a belief the Elixir is powerful since it’s using BEAM VM based on Erlang. I have the same feeling for that. Problem is that’s not proven. I just expect that Elixir is faster than Ruby, Python, but not sure for Node and Go.
I’m curious about the throughput benchmark among backend JSON server; standalone Elixir, Elixir on Docker/K8s, standalone and containerized Node.js/Python/Ruby/Go.
3. Requires more investment; time.
“Should any other developers choose to learn Elixir?” I don’t think so. Python is more easy to start. Node is getting a de-facto lang for backend and eary to get used to it if one has JS experiences. Go is growing rapidly and fit to use micro devices, which is exploding in IoT world.
Elixir is like a raw foods, but the field requires instant foods. Fields need productivity, not curiosity.
4. Docs mingled with codes are not good at all.
Docs should be separated from working codes. It harms code maintainability, readability. I prefer to leave documentations and remarks, but the Elixir isn’t kind enough to this.
And Elixir codes are hard to refactor and style. This produces a poor maintainability. Common code styles and structures aren’t established yet, I think.
1. Python, Node, Go, Ruby aren’t elixir’s competitor. Docker, K8s, AWS are.
I think Elixir is not like a backend lang; Python, Node, Go, Ruby. Elixir is more like a container and PaaS. I don’t use docker deploying my pet project. Of course it was the preferred method. but after finding some features of Elixir, it has similar features already. It was interesting.
2. Elixir has a great potential.
Elixir is unusual for some criteria. Process feature is useful, handy, and integrated seamlessly. FP is better suitable than OOP for small medium services, personally. Elixir isn’t a good fit for general purposes, but maybe a good answer for some sophisticated requirements.
As we all know, the scaling in the cloud computing is crucial. But I have no idea about Horizontal scaling; spawning multiple processes or nodes and/or containerize, which is efficient and handy?
I’ll keep practicing and build my pet projects in Elixir.
I just started to learn Elixir, but Elixir is NOT more performant than Go. Go and probably NodeJS as will beat it in CPU bound tasks. Because Go is compiled to assembly and NodeJS has JIT but it’s single threaded. Also functional languages like Elixir usually come with performance costs.
But where Elixir shines is fault tolerance, stable low latency, better concurrency model, easier to reason about multi-theading model (no locks because it uses message passing and no hard to find bugs because of it), per process GC, immutability and it’s easier to split between servers because message passing is already used. Probably forgot some other awesome things.
If you really need something for high performance CPU bound tasks I’ve read that example Discord uses Rustler to write them in Rust language. Similar what Python does with Tensorflow machine learning stuff. But I bet most apps don’t need it.
2. Performance isn’t proven.
What kind of performance are you looking for, and for what task ?
I agree with your comment.
I just presume a raw performance like this: Go > Node/Elixir > Python/Kotlin > rest of… But no records I have.
“Fault tolerant” is not a brilliant feature I think. Containers and Node provides too.
Rust/Rustler, TF : sorry I have no idea.
I wonder how fast and concurrent throughput Elixir can provide among other languages.
- JSON request -> (String conversion) -> JSON response
- HTML request -> (String conversion) -> HTML response
- WebSocket pingpong
No DB query, since DB query is a way from the backend core lang’s job.
Raw performance probably goes something like this Assembler > C > C++ > Rust > C# (.NET Core) > Kotlin / Java (JVM) > Go > NodeJS > Elixir > Python > PHP, Ruby
But performances differ depending based on compiler and runtime used. Also lot of languages have most internal functions implemented in C or C++. So it’s not so easy compare performance between languages.
And to add NodeJS doesn’t provide any fault tolerance. Containers don’t provide fault tolerance directly but other with apps build top of them do. Elixir/Erlang provides fault tolerance inside an app, so that badly behaved part of the app can be restarted and not the whole app. I don"t know any other platform that has fault tolerance support like it. So yes it’s special.