Fl4m3Ph03n1x
HTTP2 error using gun
Background
I have an application that uses gun to make HTTP requests to a server while keeping the connection up.
Problem
The problem here is that I am getting the following general stream errors:
2019-03-13 14:55:55.258 [error] GENERAL ERROR: {:stop, {:goaway, 0, :enhance_your_calm, "too_many_pings"}, :"Client is going away."}
2019-03-13 14:55:57.282 [error] GENERAL ERROR: {:stop, {:goaway, 5, :enhance_your_calm, "too_many_pings"}, :"Client is going away."}
2019-03-13 14:55:58.924 [error] GENERAL ERROR: {:stop, {:goaway, 3, :enhance_your_calm, "too_many_pings"}, :"Client is going away."}
GOAWAY
So, according to the HTTP2 spec:
The GOAWAY frame (type=0x7) is used to initiate shutdown of a connection or to signal serious error conditions. GOAWAY allows an endpoint to gracefully stop accepting new streams while still finishing processing of previously established streams. This enables administrative actions, like server maintenance.
So, I am guessing the server receiving my petitions isn’t too happy and is telling my client to slow down.
My confusion
This is confusing to me due to some reasons, the main one being that I am not attacking a single machine, I am attacking a cloud balancer that has a cluster behind. In theory, this balancer would distribute my load to all the servers and things like this wouldn’t happen.
I also don’t understand if gun simply closes a connection and opens again or if any data was lost.
Can someone help me understand the causes and consequences of this error?
Most Liked
peerreynders
Disclosure: Here I’m just as a clueless as the proverbial rubber duck.
But in the interest of making more information available:
- The error suggests to me that a PING frame is being answered with GOAWAY about every two seconds.
- Found a critical (possibly misinformed) opinion about the practice of HTTP/2 ping frames.
- The gun documentation suggests that the default HTTP/2 ping timeout is 5000ms.
ENHANCE_YOUR_CALM (0xb):
The endpoint detected that its peer is exhibiting a behavior that might be generating excessive load.
Do the errors change when you extend your configured timeout for gun? (Not expecting it to but it’s good to rule things out).
axelson
What cloud balancer is the service using? Perhaps the cloud balancer is doing some rate limiting?
shanesveller
Without knowing more about your circumstances, I can’t point to a concrete solution, but my hackles always go up when people speak in absolutes - and to call a problem “impossible” to solve on a major IAAS provider doesn’t sound likely.
For this specific concern, better use of connection draining, load-balancer health checks, and possibly some more graceful shutdown behavior of your own application, should be able to eliminate close to 100% of end-user 502s that stem from per-server interruptions like rolling deployments rather than true application defects.
This is part and parcel of zero-downtime deployments, which has been a desirable practice for quite some time, so we probably would’ve heard by now if it was truly unattainable on GCP. Their load-balancing is, in many ways, a bit more sophisticated than what AWS offers, for example.







