Graceful shutdown of Elixir app in docker

matteosister · March 1, 2017, 9:28pm

In our company, we have an application that runs rabbitmq consumer in elixir on AWS ECS. So everything run on docker containers, and we scale horizontally by adding more containers.

For this reason many node go up and down during the day. As the business grow (and the number of messages handled by consumers) we are facing problems with workers interrupted in the middle of the message handling. We are trying to do whatever we can to mitigate the problem by implement transaction that commit at the end of the job, but we are also trying to understand if it’s possible to watch for SIGTERM message and clean up before shutting down.

I found this post on stack overflow that basically tells that erlang/elixir is not able to handle unix signal.

Which approach you suggest in a case like this? Anyone cares to share his/her experience? Many thanks in advance!

josevalim · March 1, 2017, 9:45pm

Erlang ships with two amazing command line utilities which you can use to run any application and connect to it any time you want. They are called run_erl and to_erl:

run_erl ./my_app /dir/for/logging iex -S mix

The command above will execute iex -S mix and give it a name of my_app and will log any entries to “/dir/for/logging”. Make sure the logging directory exists otherwise run_erl may fail silently.

Now you can connect to the iex terminal of that node at any time by doing this:

to_erl ./my_app

This means that, if you use run_erl to start Elixir with IEx inside Docker, you can connect to IEx and issue :init.stop/0 for proper node shutdown. :init.stop/0 will go application by application and shutdown their supervision tree respecting the configured timeouts. You can automate it by running:

echo ":init.stop" > to_erl ./my_app

And that’s it! If you are using releases, for example via Distillery, they handle this stuff automatically for you.

svsdehh · March 3, 2017, 9:25am

we have a very similar but slightly different use case.

we want to deliver a 503 on a health endpoint for some seconds on a sigterm, before the server is shutting down. we are using distillery for a phoenix micro service and the service is running in docker.

the reason for the 503 on the health endpoint is, so that the HAProxy in front of the service is recognizing that the service is going down before it is really down. so the service wont get any more request from the haproxy.

has anyone solved a similar situation or has any suggestions, how to solve this?

josevalim · March 3, 2017, 9:49am

Shouldn’t you rather tell HAProxy to disable any requests to that proxy directly after the new service is up without having the 503 indirection?

svsdehh · March 3, 2017, 11:32am

thanks for your answer, but that’s not how it works here with mesos and marathon.

marathon is only reporting the status quo to the HAProxy and not what he intends to do (e.g. shutting down services).

when marathon is shutting down a service, the service must therefore report a 503 via a health endpoint, so that the HA Proxy will remove the service before it is really shut down.

we are looking in the moment into distillery, how we can hook us into the sigterm trap, but we thought that maybe there are already some best practices for this usecase, so we do not have to reinvent the wheel.

josevalim · March 3, 2017, 12:18pm

You could have a plug that checks for a given application key and returns true if you should return 503.

Then you could have a function in your application, let’s say in the MyApp module that does this:

def slow_shutdown do
  # This is the value that the plug will check to know if it should return 503
  Application.put_env(:my_app, :shutting_down, true)
  # Let's wait a second before finally shutting down
  Process.sleep(60_000)
  # Finally turn everything off
  :init.stop
end

Now you can use the same tips I used above and instead of echo ":init.stop" > to_erl, you can do echo "MyApp.slow_shutdown" > to_erl. If you are using distillery, it is likely distillery ships with an option to run a module+function in the currently running node.

svsdehh · March 3, 2017, 12:33pm

thank you very much for your tip.

we will now see, that/how we can execute the code in the current node. that should do it then.

ValtteriL · March 5, 2025, 6:06pm

Stepping into the grave to provide others like me with a simple solution:

A received SIGTERM signal to beam will generate a ‘stop’ message to the init process and terminate the Erlang VM nicely. This is equivalent to calling init:stop/0.

Erlang/OTP 19.3 Release Notes

Thus, since version 19.3, applications will terminate gracefully on SIGTERM.

This makes containerized deployments simpler than playing with run_erl or to_erl.