You may not need GenServers and supervision trees

dimitarvp · June 11, 2018, 12:14am

This makes me wonder who promised you a 100% saturated CPU with the BEAM? It’s a very common knowledge that Erlang / Elixir should not be used for heavy number-crunching. The overhead you speak of is basically orchestration and coordination and every system that does the same – Kubernetes included – has it. I can’t really pinpoint your gripe with the BEAM; it never promised to be a C / C++ replacement. Unless I am misunderstanding you?

No, you don’t. Even Java, to this day, struggles to give you something transparent like Task.async_stream. I’ve never seen a dynamic language that is actually able to do it (of course, I don’t know them all). In my 16.5 years of total experience Elixir is the first language that gave me a mechanism to distribute work on all CPU cores as an integrated part of the code pipeline. Ruby, PHP, Python, Javascript – they cannot get it right even today despite the continuous and rather hilarious attemtps. The best Java did was try and imitate the OTP (Akka framework) and it failed to provide half the guarantees of it. I don’t imagine C# being in a much better shape but maybe you will correct me.

C# and Java are still stuck trying to provide POSIX-like semantics (mutexes, semaphores, condition variables) and most of their multicore story PR is to hand-wave away the problem that giving programmers direct control of OS threads is never going to work – because it still hasn’t worked; deadlocks in C / C++ are the most normal thing in the world even today (in Go as well, to a lesser extent).

Strongly disagree. The fact that they protect you from a plethora of nasty synchronization bugs which many other frameworks have might be giving you the wrong impression that all other frameworks do the same. Which is sadly not true.

The biggest app I participated in still has the problem of very occasionally disconnecting from the Postgres DB server; Ecto made this totally transparent: it simply reconnects and retries the SQL command and we wouldn’t even know of the problem if we didn’t have paranoid logging. And it never really caused a problem. The fault tolerance made this problem just be of a curious value and didn’t force anyone to go into firefighting mode.

Rails handles them just fine, yeah… And is doing so 85x-100x slower than Phoenix and Absinthe. That’s not a joke; I already rewrote 2 big Rails apps to Phoenix and Absinthe and the average response times went from 310ms to 3.5ms. I watched the real-time graphs and was shaking my head for 10 good minutes back then. Also, Rails apps had caching; Elixir apps still don’t have it.

Examples? Those you gave are very generalized and I cannot see how could we discuss them without more details.

Elixir is like any other language and tech – it’s a tool and you always have to pick the right tool for the job. Its drawbacks are mostly that it isn’t suited for number crunching and things like DB indices (namely large mutable data structures that have to be modified in-place and very quickly). Outside of that, I can’t find a flaw in Elixir or the BEAM; wrote 3 commercial projects with it so far and have at least 5 personal smaller projects and it made me so much more productive than before.

Pardon the probably inaccurate observation – you do seem like a person who judges Elixir for promises that it never made. Maybe your work simply isn’t well-suited for the BEAM languages? That’s quite okay, they never claimed to be end-all, be-all. (I wouldn’t ever try writing real-time video streaming in Elixir for example; probably explains why most of Twitch’s infrastructure is in Go.)

This is a much bigger niche than you imply. I personally wrote tens of thousands of code lines in C++, Java and Ruby trying to achieve fault-tolerant server-side apps and never succeeded – many others like myself failed as well. Sadly @rvirding is right: most of us write half-done Erlang OTP variants while we work outside of the BEAM. Only took me 14 years to realize it but what can you do.

To have a productive discussion, I believe you should give concrete examples of projects where you think the BEAM languages are a poor fit.

Apologies if I misunderstood you anywhere along the way.

gon782 · June 11, 2018, 7:14am

This is plain false. It’s trivial to saturate all your cores with the BEAM. The overhead is minimal compared to CPU-intensive workloads and there is no bookkeeping that will ever contend with an actual CPU-bound task. As @cmkarlsson said you have a bottleneck somewhere in your system that is causing this. Even with thousands of processes being scheduled over several cores you should be able to keep your processors pinned given an actual CPU-bound, consistent workload.

josevalim · June 11, 2018, 8:38am

So you are using a mostly IO bound workload to profile the CPU? Not maxing out the CPU may actually be a good sign. Because you are being vague on the details, I am free to interpret your data like this:

C# had to serve less requests because it maxed out the CPU
Elixir was able to serve more requests and have spare CPUs

If that’s the case, I will pick the second, thank you.

That’s why when talking about benchmarks, we need numbers and methodology. There are about hundreds of things that could go wrong. Or even when the measurement is right, we take the wrong conclusions. So unless you can provide applications, benchmark tools and methodology, there is nothing to conclude and nothing to discuss.

Can you please provide an actual example? Please let us know your OTP version and OS too. In literally years benchmarking Elixir applications with tools like wrk, I did never bring it down. Even when opening 2 million connections - where we used 40 different client machines to benchmark a single server.

It is not about bad requests bringing down the server but how you semantically react to those. I have literally seen frameworks and libraries rescuing OutOfMemoryError and putting systems in an unmanageable state because of that.

Still, focusing on bad requests is a gross misrepresentation of what fault tolerance means on Elixir. It is also drastically undervalues the benefits of process in designing those systems. Some examples:

Ecto being fault-tolerant means safer design around connection pools (and leaking of connections)
Phoenix being fault-tolerant means we can easily multiplex multiple channels over the same websocket connection and save on system resources, while also scheduling on CPU and IO bound work
Ecto being built on top of processes grants an excellent amount of visibility into the system. You can navigate process trees, inspect the pool memory, state, queue size and more, etc
Phoenix being built on top of processes means no stop the world GC and per process garbage collection

And the list could go on and on.

You did not. You made vague statements. “It isn’t fast”. For what? Compared to what? You said it “doesn’t maximise the CPU” but you didn’t provide an example workload. It fails during benchmarking? How? What errors? Under which scenarios?

Yet we see companies using it for data processing with GenStage and Flow. Or for the web with Phoenix. Or for embedded devices with Nerves. I recommend folks to look at videos from conferences such as Empex, ElixirConf, CodeBEAM and others to learn more about the variety of use cases BEAM is deployed to.

jakemorrison · June 11, 2018, 10:29am

I wrote this blog post about avoiding GenServer bottlenecks. The best architecture is generally to model the natural concurrency of your system, and Elixir makes it really easy and safe to handle concurrency. The language is also a pleasure to write in.

PragTob · April 24, 2019, 5:28pm

Sorry for a bit of thread resurrection, but I took the initial post and enhanced it a bit with the discussions here, quoting some peeps + some other comments and posted it as a blog post.

Thanks for your input once again everyone!

dimitarvp · April 24, 2019, 5:50pm

I am actually still finding myself nodding in agreement with the title and your general premise – including the newer blog post.

I came for OTP. I stayed for the functional programming.

Additionally, Ecto and Phoenix already make very good use of OTP. So truthfully, if you use either (or both) you already are reaping the benefits of OTP, as stated by @cmkarlsson and others here.

nathanl · April 26, 2019, 6:35pm

having just Phoenix and Ecto be fault tolerant buys you nothing. Any modern web framework (regardless of language) can handle bad requests without bringing down the server

Fault tolerance isn’t just “what happens if there’s an exception”, it’s also how you handle load. If you can’t accept connections or if you take 30 seconds to reply, you’re effectively down.

What follows is my theoretical understanding.

Many web frameworks require running 1 OS thread per web request. Eg, you explicitly configure how many processes and threads per process to run if you’ve got a Rails app running on Puma. If you have 16 total threads and you get 17 simultaneous web requests, one of them is waiting in line. The 16th user is (hopefully) getting a nice response, and the 17th is hung entirely. Anyone making 20 requests at a time has got you with a denial of service attack.

Phoenix (via Cowboy) runs one BEAM process per request, and we know we can run millions of those. A million simultaneous web requests will probably hit some other bottleneck, like the database, of course, but Elixir itself will just keep adding processes as needed, each one slowing down the existing ones very slightly as they share scheduler time, giving a very smooth degradation under load, all without you having to think about it.

gon782 · April 27, 2019, 8:55pm

@sync08: Did you ever figure out why you weren’t able to saturate your cores and why C# pinned them while doing IO?

AVee · July 13, 2019, 12:38am

Web frameworks generally are have been moving away from that for quite a while now. Basically since the C10K problem (a term coined in 1999) and the C10M problem became a thing. This has been sped up recently due to better OS support (epoll, IOCP) and even more so because of better language support (async/await in C# for example).

Also, the 17th request can be handled by just starting a new thread (or OS process) in even the most primitive webserver. That may not be the most resource efficient way to do it, but it certainly isn’t a denial of service. How do you think the current internet could even function of that was true?

See How MigratoryData solved the C10M problem: 10 Million Concurrent Connections on a Single Commodity Server – Scalable Realtime Messaging for an example of doing 10 million concurrent connections with java from 2015. And note how the article is more about OS and network config and not at all about how to write the code, because that is a solved problem.

jeremyjh · July 15, 2019, 12:20am

He’s not wrong. His example of Puma - 16 threads - is the default for that Webserver: https://github.com/puma/puma#thread-pool

You can set it to 16 or 16,000, either way when you run out of threads the new requests are blocking.

dimitarvp · July 15, 2019, 11:26am

OS threads are a limited resource and most OS-es struggle under pressure when having to spawn thousands of them.

This is as DoS as it gets. Make a lot of requests that are costly for the OS and the hardware and they buckle under pressure. Maybe we have different definitions of DoS in mind?

Current internet is barely limping as is evident by the average load times of most websites and the fact that the best way to kill a website is to post a link to it on a popular aggregator like HackerNews or Reddit.

Apart from the few smart folks that setup a main website hosted on services like Netlify / CloudFlare, most websites are a DoS waiting to happen.

It’s true that a good chunk of a well-oiled webserver magic lies in OS configuration, everybody knows that.

But to claim that how to write the code for it is a solved problem is simply false.

AVee · July 15, 2019, 1:12pm

I don’t have any intimate knowledge of Puma (or Ruby for that matter), but a quick search shows there are alternative servers for ruby which do asynchronous IO and/or Fibers. They might well exist because Puma is doing it wrong. A lot of older frameworks do work that way, and it is often hard to change existing code to work in a fundamentally different way.

However, most webframeworks will run multiple threads. It’s what happens inside those threads that matters. Are those threads unavailable until the request is fully served, or do the get put back into the thread pool when they are waiting for something external? That is the game changer, In the first case your CPU may be sitting there idle because all threads are ‘busy’ waiting for something from the disk of the database. The latter case allows the CPU to work on other things (e.g. new incoming request) in the mean time.
This is also basically the way the Beam VM works, Beam also starts a OS thread per core and then runs multiple Erlang processes on these threads in turn.

I’ll dig a bit deeper into (my view of) the basic principles behind this.
Firstly it’s important to note that this is only relevant when there is IO being done on which the CPU has to wait before it can finish the work. Imagine a service which just calculates the square root of an input number, nothing else. In that case the running a thread per core and just processing the incoming requests in sequence is the most efficient way of doing it. There are two possible outcomes, either you keep op with the requests, or you max the CPU and still can’t keep up. In that last case you simply lack the computing power and there’s is nothing you could have done differently which would have made it work. Worse still, anything you change will mean more overhead, reducing the number of requests you can process.

But most software isn’t like that, any webserver needs to deal with reading from and writing to the network at the least. But it’s likely doing logging to disk, reading files, talking to databases or other services, etc. Because it’s a waste of CPU cycles to do nothing while waiting for those external things you want to do multiple things concurrently, if one piece of code is waiting for something you should run something else in that time. On the conceptual level that is all, it isn’t more complicated then that. At least, when just looking at resource usage, more on that later.

On the technical level the OS solves this for you, just run a process for each piece of work you’ve got and the OS will schedule those processes on the CPU and you’re done. Conceptually there is nothing wrong with that. But a process has all sorts of stuff attached to it (users, permissions, open files, etc.), which means that each process needs quite a bit of memory just to exist and on top of that it means that switching between processes is quite a bit of work. It works fine, it’s just inefficient.
To improve on that we have threads, which are just ‘smaller’ processes which belong to the same process. It’s still the OS that is scheduling them, but there is less information attached to a thread which means they use less memory and switch faster. But it still has stuff attached to it, and we still move in and out of the kernel when switching. This also works fine, but there is still a fair bit of overhead.

Now if we want to be more efficient then threads are, there’s only one option left. We need to start scheduling the work by ourselves. And because the VM running the program has more knowledge of what is going on in the program it’s in a better position to keep track of pieces of work and do the scheduling with minimal overhead. Not zero overhead, just less overhead. This is what Beam has been doing from the start. But it is also basically what the ‘green threads’, ‘fibers’, ‘async’, ‘non-blocking’ things boil down to.

Now there is a bit more to it than just getting work done a fast a possible. When there is one chunk of work which needs an hour of CPU time to complete you likely don’t want everybody else to wait for that. So you start switching between different pieces of work in a way you deem fair, even though that creates extra overhead. How to do this is a whole science in itself with loads of trade-offs. The OS does schedule the threads in a way it deems sensible, you have some influence on that but it’s limited. Therefore, doing your own scheduling also allows you to make certain choices yourself, instead of relying on the OS. Like how you deal with high loads, whether you favor consistent response times over better average performance, how you deal with failing ‘threads’, etc. I’m guessing that for Beam those things where actually more important then the efficiency.

There are also downsides to doing scheduling yourself. Firstly it’s not easy, secondly you might need some knowledge of the hardware to get the most out of it. For example, when multiple cpu’s/core’s became a thing the Beam VM needed to be changed to support it, while those using threads got it for free as soon as the OS supported it. But taking things like hyperthreading, CPU cache size, etc into account can allow you to improve scheduling. The OS is a far better position to take those things into account, and with increasingly complex CPU’s (core-complexes, Big-Little architectures) this is more relevant then before.

While user-level scheduling is all the rage right now (for good reasons), there is already movement towards improving the OS to support more efficient threading. My guess it that this will cause a reverse movement in the future, because the effort of implementing scheduling yourself may at some point not be worth it anymore. The Beam VM might be the outlier there, as it has some design goals which differ from what mainstream software is doing.

Note that this is all about how the software eventually runs on a machine, how a programming language supports this (and hopefully makes it easy) is a whole different topic which is often muddles the discussions about it. Even more so because languages are often tied to a specific VM and thus tied in to a specific way of scheduling work. Not to mention the fact the terminology is often confusing as well.

AVee · July 15, 2019, 1:19pm

I’ll try to keep this reply a bit shorter

PS C:\Windows\system32> (Get-Process|Select-Object -ExpandProperty Threads).Count
3262

That’s my laptop right now doing nothing special but running thousands of threads.

Hardware is the ultimate limited resource. Therefore you can DoS anything if you manage to create more work then the hardware can handle. It’s just easier if the system you’re trying to DoS isn’t making efficient use of the hardware. Adding Cloudflare is nothing but adding more hardware…

So what I’m saying here it that starting threads is not fundamentally wrong, just not the most efficient. There’s two ways of dealing with that, don’t use lot’s of threads, or make thread more efficient. Both are valid options, neither will allow you to exceed the limits of the hardware.

I disagree, but I may have a different definition of ‘solved’.
Driving 400km/h in a car is a solved problem. The fact that most cars aren’t nearly as fast as that doesn’t change that. The knowledge required to engineer a car which can both reach and deal with those speeds exists. That makes it a solved problem.

People run servers dealing with millions of concurrent connections, it’s all researched, documented and even available in free existing tooling. So that too is a solved problem. That’s not the same a it being trivial, or cheap, or easy, or even common knowledge. And I’ll happily agree it’s not trivial just yet, and it should become so easy we don’t even think about it anymore.

jeremyjh · July 15, 2019, 2:05pm

I am familiar with async I/O. Its a big part of the reason I use Elixir, because I get a sequential programming model while everything is event driven below it, and this is true across the entire ecosystem. This is not true of other popular web frameworks such as Ruby on Rails (Ruby), Django (Python), Spring (Java) or Rocket (Rust) which rely on many synchronous I/O libraries. In those other languages you have to take care to use libraries compatible with an evented I/O model (kqueue/epoll), and the majority of their ecosystems are not compatible with that. So yes Event Machine, Tornado, Akka/Play, Actix all exist and you can use them but in most cases they are not the most popular choice because it is easy to scale webservers horizontally and hard to replace the entire ecosystem of libraries.

PragTob · July 15, 2019, 2:22pm

Adding to what Jeremy laid out, it’s not just hard to replace an entire ecosystem of libraries but if you miss just one call in one library that is sync instead of async it’ll shoot your async I/O webserver performance in the foot massively. Of course granted that one call is used often enough.

For me that is one of the reasons nodeJS got so good/popular - there was no I/O, all APIs could be built async I/O first and all sync calls are clearly marked as such.

amnu3387 · July 15, 2019, 3:52pm

For me, at least, the thing that makes me prefer the BEAM/Erlang Programming model (and consequently elixir’s) is that at a conceptual level it allows me to express programs as I think is the most intuitive way (and this might just be a personal thing, like FP/OOP, etc).

For instance the UNIX philosophy that every program should do one thing and be pipe-able to another, I find the Erlang process model to be that taken to a more extreme level (in this case a good thing) - each process running inside the VM can be as if a completely autonomous “program” itself - with it’s own lifecycle, it might just be a small “task” or live tied to the app lifecycle itself, it can have as many/none points of contact/dependencies on other running processes inside the vm or not at all, it can communicate/not on its own with external resources independently of anything else going on.

And the reason I find this intuitive is because even though one usually builds “one program” to be run, this “one” program domain can in fact be thought of as several “smaller” programs most of the times and erlang allows me to build things in that way.

Then the fact that these smaller programs (processes) need to pass messages between them to communicate - this is a characteristic that makes it very intuitive to describe asynchronous behaviour (and differs from the unix “pipe” because it is asynchronous), and when you think about it, once you leave a single machine, everything becomes that even if you’re not using erlang - you need to send an HTTP request, or connect to a socket, poll an instance, receive messages from a broker, etc. All of these are at a conceptual level “message passing” and it’s the only way two machines can communicate. Even if you have a database as a central store, each machine has to talk to it in messages - which means that the model erlang uses itself is reflective (conceptually) of how things work once you leave the boundaries of a single machine (or have multiple “independent” programs running at the OS level).

Now like you said, we know that underneath there’s no concurrency, just the illusion of it provided by ever faster and smarter hardware and logic (except if you go to more than 1 machine), and so is the same in Erlang, the way it schedules work makes it so that I can “forget” about the nitty gritty details of laying out the groundwork for having that (that I wouldn’t be able to do even if I wanted) and just focus on the “higher” level things I’m doing - again, this is not only a green thread/fibers thing (from what I understand), because the BEAM is preemptive in its scheduling, so even if you have a fully CPU bound task/process going on, the BEAM will be able to manage it (of course it can become degraded or fail at some point) but as you said this is not usually what a “program” will do in real life, so usually if this happens it will be sporadically and not for an infinite amount of time, which means it will be, most of the times, able to work as if it wasn’t bottlenecked with no/almost no degradation.

Lastly, the “fractal” semblance of an erlang application - in which you have an “entry point” (the topmost level supervisor/application) and beneath that another group of processes and/or other “entry points”, which can go on has deep as needed while being completely independent but still retaining the ability to all talk transparently between them, be started/stopped independently, etc. So a process that is started N levels deep can message any other process no matter where in the “tree”, and depending on how you write it, can effectively change/alter the running behaviour of the whole “program” (or just parts of it, or wtv). Very important also is the link/monitor semantics which allow me to set those “connections” at runtime and model reactive programs in, again in my view, a very intuitive way.

Having said all this, there are of course problems and always room for improvement (and also costs, as you said, someone needs to work on the BEAM to keep it up to date, and at some point it might be infeasible to do taking advantage of other developments as you mentioned) but personally I prefer this model to any other I’ve seen - because in my personal opinion it’s not only the processes/fibers/green threads, but also the way those can be woven into a coherent program.

AVee · July 15, 2019, 6:05pm

That is the very reason for me being here. In other solutions it’s always bolted on later, and therefore not as elegant as it could have been. And everything I’ve seen so far creates ‘two worlds’ in the code, one for synchronous stuff and one for the asynchronous things. Generally with a some hidden surprises when trying to bridge between those worlds. For example async/await in C# is creates some interesting new ways to create deadlocks…

True, but while the node ecosystem is fully async, on the language side it’s not clean. Javascript wasn’t designed with async in mind…

I’m starting to get some of that vibe, however not for every problem/program. Sometimes a different mental model just fits better with the problem at hand. Sometimes I do prefer raw (average) throughput over consistency under load. Sometimes I want to shared state because it’s so much faster then copying the data around, sometimes I want my list to be doubly linked.
But maybe that’s just me wanting the best of both worlds

What I’m seeing right now though is lots of languages solving async in a similar way. What I’m hoping for is to see that trickle down into to OS and settle at some point where there is a common concept of ‘lightweight’ concurrency. It’s rarely used (and there are reasons for that), but Windows already has fibers build into the kernel. For linux a lot of work has been done to reduce the overhead of threads and libraries to do fibers exist. Making cheap concurrency OS level feature could create the way for programming languages to exploit that without having to do the heavy lifting of scheduling. It would also bring a common understanding and allow for better interoperability between languages for async code.

I totally agree message passing fits well with cross machine communication by the way. I’ll even go as far as stating that most of the touted benefits of microservices can be had inside a single machine (and a single OS process) by doing message passing between distinct parts of the system. But with way less overhead, so much less that you might end up not needing multiple machines at all. (And by the time you do it’s easy to add.)

dimitarvp · July 15, 2019, 8:24pm

Indeed we do have different definitions. Driving 400km/h in a car requires a unique combinations of chassis alloy, a structure of the vehicle, tire material, fuel mixture, and a ton of other stuff to make it work reliably.

IMO here’s our disconnect. Earlier I thought you claimed that the BEAM VM is providing nothing so special that many other runtimes don’t give you as well. If that’s your stance then I strongly disagree.

Us the techies regularly underestimate the runtimes we work in while often times they are practically the most important factor of our work – OS and their settings are included as you alluded to earlier. That runtime matters a lot is not a widely known law of our profession makes me sad. Seeing people in HN or Reddit regularly deride Erlang/Elixir because they pontificate that “concurrency and responsiveness are solved problems” is honestly laughable. Concurrency, parallelism and lag-less operation are anything but solved problems – not universally – and many people’s decades of careers clearly show it, mine included.

I am fully with you. I happen to believe our current generation is dumb as hell for not having autonomous robot maids at home and that us the humans shouldn’t even think about our survival anymore. Yet here we are in today’s reality.

My point was, and still is, that there’s a huge gap between something being well-known in certain circles, and the same something being used to better an area (or civilisation).

Truth for us the programmers is, we are conservative. Far more than we should be. People fall in love – and get comfortable with – the Python syntax, the C/C++/Java syntax, Haskell/Rust syntax, LISP syntax, etc. People don’t want to hear about Erlang/Elixir because it looks too foreign to them – but also because they haven’t been woken up to repair production servers past midnight. Still though, them hand-waving away useful technology because they don’t want to switch syntax in an IDE / editor is not professional.

So, solved problem or not, lightweight green threads with preemptive scheduling are very far from the de facto way of doing things, and that’s a shame. And I will always disagree that “many other technologies give you that” as people regularly claim. No, they really don’t.

AVee · July 16, 2019, 11:15am

Well, yes and no. Firstly people often just don’t need it to the extend that Erlang provides it. Just like 400km/h is cool, but realistically anything that goes 200km/h+ is already fast enough. It might be nice to have and technically superior, but if it means not having aircon in the car it’s not worth it.

Let me give you an example, I’ve been working for a company that was producing ‘IoT’ devices (in quotes, because they’ve been doing that well before the term even existed). This company was mainly a Microsoft shop. At some point the software doing the socket level communication started to struggle with the load. Now this was written in C# and in pretty bad shape so we decided it would be binned and replaced.

Now this would be a scenario where Beam could shine right? Well, it would be, but not nearly enough to bother with introducing something so different into the company. That’s just the simple fact of the matter. I did a few test where C# code kept dropping connections under load, but java code using Netty remained stable at peak loads well above what we needed. Java is close enough to C# to be approachable for the existing developers and Netty also provided some nice tooling for protocol parsing (at speed). Netty really is the go-to library for sockets in Java, widely used and rock solid.
So there it is, free concurrency and parallelism, stable, low learning curve and future proof. Right there and then this was a better solution then using Erlang/Elixir.

Now what did we mist out on? Consistent throuhput? Irrelevant, the devices where communicating a few times a day behind the scenes without anyone having to wait for that to finish. Supervisors? It ran for about a year without ever crashing (and then I left that company). So running it as a service and with automatic restarts when the process dies is good enough. Hot code reloading? The devices retry on failure, 5 minutes downtime do do upgrades is not a problem at all. Scale out? The connections didn’t share any state, so you can just run N instances behind a load balancer. And it wasn’t anywhere near the limits of a single machine anyway. Getting called in the middle of the night? My phone is on silent, and nobody is watching the stuff anyway. Just restart it in the morning and the devices will retry and catch up eventually.

What did we gain? A platform with a large community, lots if info available online, loads of libraries and tooling readily available and a big pool of programmers who can work with it.

So that’s a concurrency problem fully solved with of the shelf stuff right there. And while using Elixir might eventually have provided a technically better solution, it would not have been the better option for that company at that time. It’s not unprofessional handwaving, it’s a professional decision not to use it.

Lightweight threads are rapidly becoming the standard everywhere. As for scheduling, there are different options and good arguments for preferring certain methods over others depending on the use case. There is no one size fits all answer to scheduling. So while ‘many other technologies’ might not give you exactly what Elixir provides, they may well give you everything you need. (And may or may not fit your requirements better.) It’s all trade-offs.
Take the garbage collection in Java or .NET for example. It’s often used as a an argument for Erlang/Elixir, no GC pauses. But it’s not an accident or bad engineering those languages work that way. It’s a conscious decision to trade consistency in throughput to gain more raw performance. It’s not because they were incapable of building something with consistent throughput, it’s because they had different goals. And fair enough, because it’s really common to prefer better average performance over consistency. Now your requirements might be different, and Erlang took a different approach for good reasons. But it pays for that in terms of raw performance.

So yes, Erlang/Elixir/Beam, it provides something pretty unique and pretty cool. But there are plenty other valid solutions to providing concurrency, with different trade-offs and different pros and cons. And they seem to work out well for lots of people.
And while I truly get the conceptual advantages of Elixir, there really is a lot I like there, I will not rule out other options because they are not Elixir. There are other tools out there which can solve my problem, and sometimes they just provide the better trade offs.

nathanl · July 17, 2019, 2:49pm

@AVee you make a lot of valid points and I appreciate you sharing your perspective. Elixir definitely isn’t the One True Way™. But I want to go back to the part of the discussion where I jumped in. @sync08 said:

While I understand that, having just Phoenix and Ecto be fault tolerant buys you nothing. Any modern web framework (regardless of language) can handle bad requests without bringing down the server.

To which I replied:

Fault tolerance isn’t just “what happens if there’s an exception”, it’s also how you handle load. If you can’t accept connections or if you take 30 seconds to reply, you’re effectively down.

As you said, Elixir isn’t unique in being able to handle lots of incoming requests:

People run servers dealing with millions of concurrent connections, it’s all researched, documented and even available in free existing tooling. So that too is a solved problem. That’s not the same a it being trivial, or cheap, or easy, or even common knowledge. And I’ll happily agree it’s not trivial just yet, and it should become so easy we don’t even think about it anymore.

One thing I appreciate about this ecosystem is that, having chosen Phoenix + Ecto, you get something that’s fairly scalable by default. I think that’s quite valuable. By the time you have to start investing resources in thinking about how to scale better, you should have enough users to pay for that.