Bandit is an HTTP server for Plug and WebSock apps.
Bandit is written entirely in Elixir and is built atop Thousand Island. It can serve HTTP/1.x, HTTP/2 and WebSocket clients over both HTTP and HTTPS. It is written with correctness, clarity & performance as fundamental goals.
In ongoing automated performance tests, Bandit’s HTTP/1.x engine is up to 4x faster than Cowboy depending on the number of concurrent requests. When comparing HTTP/2 performance, Bandit is up to 1.5x faster than Cowboy. This is possible because Bandit has been built from the ground up for use with Plug applications; this focus pays dividends in both performance and also in the approachability of the code base.
Bandit also emphasizes correctness. Its HTTP/2 implementation scores 100% on the h2spec suite in strict mode, and its WebSocket implementation scores 100% on the Autobahn test suite, both of which run as part of Bandit’s comprehensive CI suite. Extensive unit test, credo, dialyzer, and performance regression test coverage round out a test suite that ensures that Bandit is and will remain a platform you can count on.
Lastly, Bandit exists to demystify the lower layers of infrastructure code. In a world where The New Thing is nearly always adding abstraction on top of abstraction, it’s important to have foundational work that is approachable & understandable by users above it in the stack.
Implement comprehensive support for HTTP/1.0 through HTTP/2 & WebSockets (and beyond) backed by obsessive RFC literacy and automated conformance testing
Aim for minimal internal policy and HTTP-level configuration. Delegate to Plug & WebSock as much as possible, and only interpret requests to the extent necessary to safely manage a connection & fulfill the requirements of safely supporting protocol correctness
Prioritize (in order): correctness, clarity, performance. Seek to remove the mystery of infrastructure code by being approachable and easy to understand
Along with our companion library Thousand Island, become the go-to HTTP & low-level networking stack of choice for the Elixir community by being reliable, efficient, and approachable
After several years of effort, I just published version 1.0.0 of both the Bandit and Thousand Island libraries. Folks that are depending on versions in the 0.x.y or 1.0.0-pre series of either library should update your dependencies to be ~> 1.0.
This has been a ton of work, and has been made possible in large part due to the help of tons of contributors. In particular, @moogle19, @ryanwinchester and @alisinabh have gone above and beyond on all fronts. The project wouldn’t be the success it is without help from folks like them. Thanks all!
I put together a bit of a retrospective blog post about the whole journey here, if anyone cares to learn more!
I have tried bandit only once. There was some simple bug, but it was hard to debug. For some reason my instinct suddenly directed me to this library. What’s funny I read about it only once before and I simply forgot its name, but somehow I have managed to find it on forum.
Anyway, bandit’s error handling was an amazing help which saved me a lot of energy and time. Been waiting for a stable 1.x release, so I could use it by default. Thanks!
Awesome effort! And I‘ll definitely upgrade to it right away! Thank you so much!
But one question I have still: what are the typical performance improvements one can expect with an ordinary liveview app? I have no good estimate how much time per route is spent in the http layer vs other parts of the pipeline. Anyone ideas?
Excellent question and one I get a lot. The short answer is that in most cases you probably won’t see much of a difference between Bandit and Cowboy from a performance perspective; your plug’s implementation is going to be the dominant factor in overall performance, and switching out the underlying server won’t magically make that work go away.
That having been said, there are many workloads in which you could expect to see a benefit to Bandit. The ideal case would be large numbers of HTTP/1 clients doing lots of IO on very short lived connections. In that case you could see some substantial benefits (see my latest benchmark for more).
Some workloads are going to be worse. In particular, HTTP/2 performance in Bandit is pretty awful at the moment, but is going to be getting a lot of attention as part of the work to add WebSockets over HTTP/2 (RFC 8441) support. This will be one of the next things I’m working on.
In terms of LiveView, Bandit’s WebSocket implementation is generally a little bit faster than Cowboy’s (around 10-20%). You might see some real-world benefit there; it really depends on your particular usage patterns.
@mtrudel Hi
I was digging into Absinthe today and noticed it has a code path that results in a process just exiting when a certain internal timeout is exceeded. From Bandit perspective, this results in this kind of error:
My understanding is that there’s not much Bandit can do here in terms of error handling and in general using exit isn’t the best way of handling this kind of case. But I was wondering, if it’s a good idea to define terminate callback for handler and issue a telemetry event? WDYT?
BLUF: Of course this is an apples and oranges comparison. I personally couldn’t give two hoots how well we perform against other languages (especially natively compiled ones running on base libraries); that’s not a game I have any interest in playing or one that has any winners. It’s MONGODB IS WEB SCALE all over again and I have better things to do with my time than to engage in the comparative aspects of this. The two contestants aren’t even playing the same sport
The PR’s setup looks fine (it’s not a matter of app configuration). I didn’t look at any of the lower level OS / BEAM tuning details
He’s using m7a.large instances, which at first glance look like they’d perform a smidge worse than the instances we use for microbenchmarks in CI. From that perspective the results seem roughly correspondent with what I’d expect in absolute terms.
I’m a little worried by the growth numbers that Bandit demonstrates. CPU usage shouldn’t be growing without bound like that, and (as he states) that’s likely the root cause of the lacklustre numbers elsewhere
There’s not really a whole lot of actionable steps to take based on this data. We really do need a better benchmarking environment (ideally one that runs as part of CI), as the microbenchmarking setup we use now just doesn’t get to the absolute scale needed to reproduce these sorts of situations ‘in the lab’, which is a necessary precondition to be able to improve them in Bandit. If anyone is looking for a place to help, that’s probably the highest value way to do so
I’d say that POST benchmark is a bit disingenuous given it is also testing Ecto. But for the GET I would suggest enabling the supercarrier to 85% of the available memory, that could give Bandit a boost. I’ve been profiling some webservers using msacc recently and quite a bit of time is spent in GC. I wonder if it is a similar scenario in this case.
Futher, the benchmark application uses Jason.encode instead of Jason.encode_to_iodata While the body is not large, I’m sure it would help. Using :erlang.byte_size on the encoded device lists shows that they breach the refc binary threshold. This has the potential to put pressure on the binary allocator and effect the system negatively. Keeping these terms to the local process heap might be a benefit.
When you mention you’ve been profiling web servers lately, is there anything actionable you’ve seen with Bandit? That low level of profiling is truthfully not something I have a lot of experience or expertise with, and I’d greatly appreciate if you would be willing to share some insights with the project
Sadly no, this was a more general debugging webapp production issues and not comparing the performance across multiple web servers. What it comes down to (in my sample size of ~5) is rarely the overhead of the web server dominates but the overhead of encoding / decoding data (Absinthe / Jason / etc). Which makes application-based benchmarks difficult because that is not where the BEAM shines.
That being said, if you link this microbenchmark I can try to use the regular suspects on it, perf / msacc / eprof / fprof, to see if anything actionable stands out.
See, that’s part of the problem. The benchmarks that run in CI are mostly intended to be comparative, in order to evaluate the relative perf impact of a branch compared to main as a baseline (I also use them to run comparative benchmarks against Cowboy).
Whenever I’ve done eprof/fprof work it’s always been against an adhoc ‘hello world’ Plug instance and a small set of hand-rolled client connections. The ephemeral nature of these test setups (and my general amateur relationship with profiling tools) makes it hard for me to really assess performance in an absolute sense.
So, to answer your question, I suppose a good ask here would be to ask how you as an SME would together a process that would profile a simple ‘hello world’ plug in a reproducible way that allowed for flexible client access patterns (single vs keepalive requests, HTTP/1 vs HTTP/2, etc). I’m happy to systematize it, I just generally don’t know my way around how to use the profiling tools effectively enough to not drown in the sheer volume of output
Hm I see. Let me stew on it a bit, I’ll take a look into that suite. I’ve written suites that compare the output of two profiles to rank differences between them but I find it is less stable for “low level” programs and more meaningful for “high level” programs, such as ones that transform complex data structures.