In our company, we have an application that runs rabbitmq consumer in elixir on AWS ECS. So everything run on docker containers, and we scale horizontally by adding more containers.
For this reason many node go up and down during the day. As the business grow (and the number of messages handled by consumers) we are facing problems with workers interrupted in the middle of the message handling. We are trying to do whatever we can to mitigate the problem by implement transaction that commit at the end of the job, but we are also trying to understand if itâs possible to watch for SIGTERM message and clean up before shutting down.
I found this post on stack overflow that basically tells that erlang/elixir is not able to handle unix signal.
Which approach you suggest in a case like this? Anyone cares to share his/her experience? Many thanks in advance!
5 Likes
Erlang ships with two amazing command line utilities which you can use to run any application and connect to it any time you want. They are called run_erl and to_erl:
run_erl ./my_app /dir/for/logging iex -S mix
The command above will execute iex -S mix
and give it a name of my_app
and will log any entries to â/dir/for/loggingâ. Make sure the logging directory exists otherwise run_erl
may fail silently.
Now you can connect to the iex
terminal of that node at any time by doing this:
to_erl ./my_app
This means that, if you use run_erl to start Elixir with IEx inside Docker, you can connect to IEx and issue :init.stop/0
for proper node shutdown. :init.stop/0
will go application by application and shutdown their supervision tree respecting the configured timeouts. You can automate it by running:
echo ":init.stop" > to_erl ./my_app
And thatâs it! If you are using releases, for example via Distillery, they handle this stuff automatically for you.
16 Likes
we have a very similar but slightly different use case.
we want to deliver a 503 on a health endpoint for some seconds on a sigterm, before the server is shutting down. we are using distillery for a phoenix micro service and the service is running in docker.
the reason for the 503 on the health endpoint is, so that the HAProxy in front of the service is recognizing that the service is going down before it is really down. so the service wont get any more request from the haproxy.
has anyone solved a similar situation or has any suggestions, how to solve this?
1 Like
Shouldnât you rather tell HAProxy to disable any requests to that proxy directly after the new service is up without having the 503 indirection?
1 Like
thanks for your answer, but thatâs not how it works here with mesos and marathon.
marathon is only reporting the status quo to the HAProxy and not what he intends to do (e.g. shutting down services).
when marathon is shutting down a service, the service must therefore report a 503 via a health endpoint, so that the HA Proxy will remove the service before it is really shut down.
we are looking in the moment into distillery, how we can hook us into the sigterm trap, but we thought that maybe there are already some best practices for this usecase, so we do not have to reinvent the wheel.
1 Like
You could have a plug that checks for a given application key and returns true if you should return 503.
Then you could have a function in your application, letâs say in the MyApp
module that does this:
def slow_shutdown do
# This is the value that the plug will check to know if it should return 503
Application.put_env(:my_app, :shutting_down, true)
# Let's wait a second before finally shutting down
Process.sleep(60_000)
# Finally turn everything off
:init.stop
end
Now you can use the same tips I used above and instead of echo ":init.stop" > to_erl
, you can do echo "MyApp.slow_shutdown" > to_erl
. If you are using distillery, it is likely distillery ships with an option to run a module+function in the currently running node.
2 Likes
thank you very much for your tip.
we will now see, that/how we can execute the code in the current node. that should do it then.
2 Likes
Stepping into the grave to provide others like me with a simple solution:
A received SIGTERM signal to beam will generate a âstopâ message to the init process and terminate the Erlang VM nicely. This is equivalent to calling init:stop/0.
Thus, since version 19.3, applications will terminate gracefully on SIGTERM.
This makes containerized deployments simpler than playing with run_erl
or to_erl
.