- Erlang/OTP 20
- Elixir 1.5.1
- Phoenix 1.3.0
- Ubuntu 16.04.3 LTS (GNU/Linux 4.10.0-32-generic x86_64)
- systemd 229
Hi everyone, I run Phoenix app in production as a systemd service and I’m getting crashes on shutdown along with forced compilation (even when none should be necessary) on startup:
[Unit]
Description=My Core WebServer
[Service]
Restart=always
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=core-{{ user }}
Environment=LANG=en_US.UTF-8
EnvironmentFile=-/home/{{ user }}/.env
User={{ user }}
WorkingDirectory=/home/{{ user }}/repo
ExecStart=/usr/bin/env elixir --sname core-{{ user }}@localhost -S mix phx.server
ExecStop=/usr/bin/env elixir --sname {{ user }}-shutdown@localhost ./scripts/shutdown core-{{ user }}@localhost
[Install]
WantedBy=multi-user.target
We build the app for production with a custom script:
#!/usr/bin/env bash
# 1. Install hex and rebar
# 2. NPM install
# bundle assets with webpack
# 3. Get all mix deps
# mix compile app
# 4. Digest all assets
# Load environment variables
export $(cat ~/.env | xargs) > /dev/null
cd ~/repo
function log() {
while read line; do echo "[$1] $line"; done
}
echo "Installing latest hex and rebar..."
mix do local.hex --force, local.rebar --force 2>&1 | log mix
( echo "Installing Mix Dependencies"
mix deps.get --only prod 2>&1 | log mix
echo "Compiling app"
MIX_ENV=prod MIX_DEBUG=1 mix compile
)&
( echo "Installing NPM Dependencies"
NODE_ENV=development npm install 2>&1 | log npm
echo "Bundling Javascript Assets"
./node_modules/.bin/webpack 2>&1 | log webpack
)&
wait
echo "Digesting app"
mix phx.digest
cd - > /dev/null
Recently, I noticed 500s on our staging server after running sudo systemctl restart core-staging.service
and realized two things:
- Our shutdown script (in which we call
:rpc.call(node, :init, :stop, [0])
) fails after successfully stopping the application, leaving it in a “failed state”. - The app then starts up immediately, but the web server takes anywhere from 90 to 150 seconds to come online. (Note the timestamp on Cowboy below.) This timing is suspiciously like our
mix compile
timing and we get back online instantly if we add--no-compile
to thephx.server
call.
Sep 22 19:52:41 staging systemd[1]: Stopping OpenFn Core WebServer...
Sep 22 19:52:43 staging core-staging[3613]: Shutting down core-staging@localhost with custom shutdown script.
Sep 22 19:52:43 staging core-staging[3613]: Sending stop message...
Sep 22 19:52:43 staging core-staging[3613]: "Shutdown successful."
Sep 22 19:52:43 staging core-staging[3413]: erl_child_setup closed
Sep 22 19:52:43 staging core-staging[3413]: #015
Sep 22 19:52:44 staging core-staging[3413]: Crash dump is being written to: erl_crash.dump...done
Sep 22 19:52:44 staging systemd[1]: core-staging.service: Main process exited, code=exited, status=1/FAILURE
Sep 22 19:52:44 staging systemd[1]: Stopped My Core WebServer.
Sep 22 19:52:44 staging systemd[1]: core-staging.service: Unit entered failed state.
Sep 22 19:52:44 staging systemd[1]: core-staging.service: Failed with result 'exit-code'.
Sep 22 19:52:44 staging systemd[1]: Started My Core WebServer.
Sep 22 19:52:46 staging core-staging[3666]: warning: variable "deps" does not exist and is being expanded to "deps()", please use parentheses to remove the ambiguity or change the variable name
Sep 22 19:52:46 staging core-staging[3666]: /home/staging/repo/deps/mailgun/mix.exs:8
Sep 22 19:54:16 staging core-staging[3666]: 19:54:16.490 [info] Running OpenFn.Endpoint with Cowboy using http://0.0.0.0:4000
Sep 22 19:54:17 staging core-staging[3666]: 19:54:17.047 [info] Starting IntervalJobsServer
Sep 22 19:54:17 staging core-staging[3666]: 19:54:17.048 [info] Starting IntervalServer
This is our shutdown script, for the record:
#! /usr/bin/env elixir
# Shutdown script
# ---------------
#
# Expects a parameter of the node to shutdown.
#
# And needs to be executed by a beam instance with it's sname set
# I.e `elixir --sname staging-shutdown@localhost ./scripts/shutdown core-staging@localhost`
node =
System.argv
|> List.first
|> String.to_atom
if Node.connect(node) do
IO.puts "Shutting down #{node}"
case :rpc.call(node, Application, :stop, [:exq]) do
{:error, {:not_started, :exq}} -> IO.puts "Exq not running."
:ok -> IO.puts "Exq stopped successfully"
end
IO.puts "Sending stop message..."
:ok = :rpc.call(node, :init, :stop, [0])
else
IO.puts("Could not connect to #{node}.") + System.halt(1)
end
Does anyone have any experience with this? Or, if it’s too specific to get into, does anyone know:
- What is the proper way to shut down a running elixir App using systemd?
- How does mix phx.server determine when it is necessary to run
compile
before starting up?
Thank you!
Taylor