GCP Load Balancer behaviour with terminating preemptive instances



Disclaimer: I am not sure if this is the correct category, please let me know otherwise.


We have a dispatcher instance group that receives around 700 requests per second per active VM. This dispatcher is behind a Load Balancer that auto scales. Thus far all our VMs are regaular VMs, however we have been studying the possibility of making them preemptive.

The problem with preemptive instances

According to the documentation GCP can terminate a preemptive instance at any time.

Let’s assume that each dispatcher VM holds no state. It receives a request, processes it and makes an HTTP request to some other machine.

At any given time, each VM will be processing around 700 requests concurrently, while receiving data from the load balancer.


What happens if my preemptive VM, processing 700 requests, receives a signal to be terminated?

Well, in theory one should have a shutdown script that makes sure processing those requests finishes and then kills the app (clean exit). This leads us to the big question:

  • But does the load balancer know that my VM is shutting down? Will it keep sending requests to the terminating VM?


If yes, then it means some requests will fail because once the app shuts down, the machine is still up and the load balancer keeps on sending requests to the machine, not knowing the app is already down.

Ideally, these requests would go back as failed requests to the load balancer and it would send the requests to another machine. However GCP load balancers are not smart enough to do this, and so they don’t. In this scenario, I would have to create an OTP load balancer myself that creates machines in GCP and kills them eventually.

If somehow the load balancer knows this VM was selected for preemtive termination than nothing special needs to be done.

Which one is it?


Generally speaking, if your use-case cannot tolerate intermittent failures of in-flight requests, and you cannot adapt your client behavior to better tolerate these failures, you shouldn’t be using preemptible instance types for that particular workload.

Per the documentation, they are best used for batch-processing and other fault-tolerant/asynchronous scenarios:

Most notably in a business context:

Due to the above limitations, preemptible instances are not covered by any Service Level Agreement (and, for clarity, are excluded from the Google Compute Engine SLA).

To your original question - I’d strongly urge you to avoid reimplementing any degree of load-balancing behavior within your application tier and instead, if you must continue to use preemptible instances, front them with a reverse-proxy application like HAProxy or nginx that supports retrying requests elsewhere in light of a failed backend. And don’t put that application on preemptible instances. In the absence of such an intermediate layer, the only possibility of retry logic is based on the browser or other client applications.

AFAIK, GCP load balancers’ only means of removing an instance from the pool are health checks, and perhaps you could adjust your application and/or introduce a shutdown script that intentionally introduces failure of the healthcheck. This should evict the instance from the load-balancer pool, but you would obviously be at a reduced capacity/availability overall until the preemptible instance finishes its shutdown and (hopefully) gets replaced.

You might also find something pertinent and helpful in the connection draining documentation.


Excellent insight!

We also found what we were looking after reading this answer in SO:

As it turns out, the load balancer will keep sending messages to the machine, even if it has been elected for termination. This results in loss of requests and connections which we’d prefer to avoid.

As a solution, upon receiving the shutdown message one could run a shutdown script that removes the instance from the instance group, thus keeping the load balancer from sending additional request to it and allowing it to finish processing the current workload.

However, GCP only gives a machine a maximum TTL of 30 seconds once a shutdown message is received. This turned out to be too small for our use case, as the load balancer takes between 27 and 40 seconds ( with an avg of 33s ) to stop sending the requests to the instance.

This means that if we opt for preemptive instances, we will lose requests and abruptly close connections, no matter what, because by the time the load balancer finally stops sending requests, the instance is already dead.