Can we do The Road to 2 million HTTP2 gRPC connections?

Ajaxdone · November 13, 2023, 2:44pm

@Gazler Posted a great blog on this Million connections

The Road to 2 Million Websocket Connections in Phoenix

I found many posts talked about this but my question is slightly different , so just create a new topic instead of replying on multiple different threads.

Can we do The Road to 2 million HTTP2 gRPC connections ? not WebSocket. Is it possible for HTTP2? thanks

josevalim · November 13, 2023, 3:28pm

It is definitely possible but I would say quite less interesting. HTTP2 multiplexes on the same connection, so you will be starting fewer connections. And gRPC is typically used for internal services, where it is unlikely you will be managing millions of them (and if you were, you probably have bigger concerns than just gRPC by itself).

sleipnir · November 13, 2023, 4:03pm

This is not always true. There are a range of projects, such as tracking and routing of deliveries, logistics in general, to name a few, that would benefit from this type of scale. I myself work on a project that sees a lot of value in this kind of thing. Of course, I tried to write some things in Elixir in this sense, such as some tests for grpc-bench and I was quite frustrated with the results we managed to achieve (GitHub - LesnyRumcajs/grpc_bench: Various gRPC benchmarks). Anyway, it would be really cool to see this move forward, even to help prove that we definitely have a good language for networks.

Ajaxdone · November 13, 2023, 4:24pm

Thank you @josevalim
Since my use case is each user has their own connection, so HTTP2 multiplexing not very interesting. But I am not sure why HTTP2 can handle much less connections than WebSocket, I need to google

Ajaxdone · November 13, 2023, 4:41pm

I want a reliable solution for my use case, I might drop HTTP2 together with gRPC for the moment, but thanks for giving valuable information.

jakemorrison · November 13, 2023, 4:42pm

gRPC is used for some large scale client applications, e.g. sending OpenTelemetry traces and mobile applications.

derek-zhou · November 13, 2023, 6:27pm

I think @josevalim 's point is that large scale gRPC is usually done in corporates or SaaS environments, who should have resource to benchmark themselves. It does not make sense for the Elixir core team to construct a synthetic benchmark that suit no one in particular.

If someone have a benchmark, and observed a bottleneck, and can share the whole setup, I am sure there will be someone here that can offer some advice for optimization.

sleipnir · November 13, 2023, 6:59pm

Great, but this was not my point. I just indicated valid use cases where this type of scaling can apply and is not restricted to conventional web systems. There are many valid use cases for gRPC.

The benchmark I posted serves as a parameter for anyone who wants to look since the main languages are represented there. I also believe that many companies use and benefit from this protocol. I also tend to talk to the maintainer of the elixir grpc library from time to time, both he and I worked at the same company in the past, looking for improvements in the implementation, and I also always respond to issues that arise there. That’s why it’s something that interests me and I’ve been working with it for many years. My comment was in the direction of wanting to draw attention to an important protocol that is widely used around the world.

josevalim · November 13, 2023, 8:50pm

Exactly, I don’t doubt there are large use cases of gRPC. My point is that the most common use case inside companies is not at this scale. And when you are at a large scale, it is often best to benchmark your specific use cases.

I find benchmarking the “hello world” case to not be very interesting because you often measure and improve the wrong things.

For the websockets case, we benchmarked the whole channel infrastructure and the PubSub system, which found bottlenecks. We would not have found those if we were really only benchmarking the websockets bits.

That said, if you have the use case, then go for it! The above is just my opinion and I have been wrong countless times before.

josevalim · November 13, 2023, 9:02pm

The benchmarks above show Elixir accepting 11k req/s on a 3 core machine. Are you expecting such a traffic on day 1? If not, it is best to focus on building your product/service. It is unlikely the gRPC layer will be your bottleneck. And you can always scale vertically, horizontally, and optimize it.

sleipnir · November 13, 2023, 10:33pm

In fact, this little Hello World brought me valuable insights, for example refactoring my system to use stream instead of unary requests. It, the above benchmark, also helped the Akka team improve their gRPC server (Akka gRPC update delivers 1200% performance improvement (so what happened?) | Lightbend). I think there is room for improvement and I think we will reach the top at some point with this protocol, I have no doubt. And don’t get me wrong, I think 11k req/s is incredible and sufficient for most applications. Just like 2 million Websockets connections is much more than most applications need
But you’re right, raw performance shouldn’t drive your business unless it is your business.

Ajaxdone · November 13, 2023, 10:49pm

I still not decide Websocket or HTTP2 gRPC to use, I need a solid foundation for my use case:

I will develop a Google Notification like system on LineageOS, since LineageOS doesn’t have a Google Notification system, and I don’t want to use Google one on LineageOS as well, for privacy issue.

My user base still small but since Websocket not really able to scale out, so need a platform that can support 500K connections to deliver notifications.
Even the post is great but if I test it myself in my own machines might still will have a lot of issue to achieve the 500K connections, but if I can’t see it working properly in my machines I don’t have the confidence to push it to PROD.
I feel websocket kind of old and have some might be wrong feeling that HTTP2 will make websocket extinct , this is why the question in title.
thank you @josevalim for you great advice on business insight and architecture views.

I will really focus on:

500K connections on single machine
RPS request/sec
Latency, I need to control in 1 sec