These are all good suggestions. Additionally we use:
- distillery for releases
- pid_file (https://hex.pm/packages/pid_file) along with system.d on centos to restart if process exits on any node.
- observer_cli (https://hex.pm/packages/observer_cli) to allow some additional debug capabilities on each machine
I’d use (or pick) something to store metrics in that can provide visualizations. Start small and pump some general data into that store. Add to it over time. We use telegraf currently for that.