Profiling startup/shutdown time on deploys

Looking through the logs, my application takes a solid ~10 seconds or more to restart through systemd.
The logs reflect that it’s the shutting down that takes so long, though that may be wrong.
Has anyone else run into this issue?

Ideally I’d like to avoid A/B deployments where you have two versions up at the same time.
The current systemd configuration works great, the only issue is the 10 second downtime when redeploying.

I was wondering what’s the best method to profile the startup and shutdown time for a prod phoenix app.

Haven’t done any stack trace analyzing before, wondering if anyone in the community has.
I’m aware of flamegraphs and I think that’s what I want, though I would be curious if anyone else has done stack trace profiling before or if there are any resources to reference.

For what it is worth I currently have prom_ex on this app as well with the default metrics, and logging through Loki. Maybe there’s something within Grafana that could help with this. Grafana Phlare came out recently, but unfortunately it does not seem to be integrated with the Grafana Agent yet, which is what I’m using through prom_ex.

What are your production monitoring strategies?

Actually a graceful shutdown ensures that all pocesses will be closed after processing their message, this is nice because you don’t have to worry about some side effect that was partially done, this however takes some time. As for slow startup, this is the nature of OTP.