Deploy sometimes complaining about node not running

I’m building a release with Distillery and then deploying the archive. After running bin/app start, I’m running migrations in a post_start-hook. The problem I have is that sometimes it errs on “Node is not running!”:

rope@web-001:~/apps/rope/bin$ rope start
Running migrations
ok
Migrations ran successfully
rope@web-001:~/apps/rope/bin$ rope start
Running migrations
Node is not running!
rope@web-001:~/apps/rope/bin$ rope start
Running migrations
ok
Migrations ran successfully
rope@web-001:~/apps/rope/bin$ rope start
Running migrations
Node is not running!

The post_start-hook looks like this:

defmodule Release.Tasks do
  def migrate do
    {:ok, _} = Application.ensure_all_started(:rope)
    path = Application.app_dir(:rope, "priv/repo/migrations")
    Ecto.Migrator.run(Rope.Repo, path, :up, all: true)
    :init.stop()
  end
end
3 Likes

I will take the liberty to summon @bitwalker to this thread. I hope that’s fine…

1 Like

I’ll follow up in the morning when I’m more awake, but your post is missing the content of the hook shell script which executes the module you showed. I’m assuming your hook is using bin/yourapp rpc... to connect to the node and run the migrate function, but the call to init:stop would kill the node, so I’m assuming I’m missing something. Is this module in an escript which is executed?

In any case, if you are using rpc, you’ll need to give a short time for the node to actually start before running the rpc call, introducing a sleep of around 1 second should be enough, but that race is more than likely what’s causing it to fail randomly.

If you could clarify how you are using the hook, that would help a bunch!

3 Likes

It’s indeed using rpc and I’ve added a sleep 1 now at the top:

sleep 1
echo "Running migrations"
bin/rope rpc Elixir.Release.Tasks migrate
echo "Migrations ran successfully"

The migration-module I got from Running migration in an Exrm release « Plataformatec Blog, where they’re explaining the use of init:stop as such:

Also, we need to call :init.stop in order to close the console, otherwise our buffer will remain opened, waiting for something.

But they’re using exrm’s command, so I assume it doesn’t work exactly the same? I’ll try with removing the init:stop.

For completeness: the calling of the hook in rel/config.exs

environment :prod do
  set include_erts: false
  set include_src: false
  set post_start_hook: "rel/hooks/post_start"
end

edit: using sleep 1 and removing the init:stop seems to fix everything, thanks! :slight_smile:

2 Likes

I created a custom command for distilery that replicates the exrm’s command functionality:

#!/bin/sh

# Consume script name
shift

MODULE="$1"; shift
FUNCTION="$1"; shift

# Save extra arguments
ARGS="$@"

set -- "$BINDIR/erlexec" \
    -boot "$REL_DIR/start_clean" \
    -boot_var ERTS_LIB_DIR "$ERTS_LIB_DIR" \
    -env ERL_LIBS "$REL_LIB_DIR" \
    -pa "$CONSOLIDATED_DIR" \
    -config "$SYS_CONFIG_PATH" \
    -noshell \
    -s "$MODULE" "$FUNCTION" \
    -extra "$ARGS"

$BINDIR/erlexec $@
exit "$?"
2 Likes

How can this be working in a post-start hook? I mean the app would fail to start since the DB is not ready and this is why we need migrations to run.

1 Like

The application would only fail to start when it can not connect to the DB. Whatever structure the database has is of no concern to the application, as long as it doesn’t try to do something with it.
So right after starting the application is the only moment you can run the migrations, as it needs that DB-connection from the application.

1 Like

[quote=“ddf, post:7, topic:1232, full:false”]… as long as it doesn’t try to do something with it.
[/quote]

This is not always the case. The app may start some background tasks that may need some DB tables and/or load some data from the DB to start working. With SQLAlchemy this is partly solved by calling enging.create_all() at start which is idempotent. At least it won’t complain about absent tables.

1 Like