Plug/Cowboy is stops listening after hours

One of our Elixir projects is handling background stuff (not a web app). I added Cowboy 1.1.2 and Plug 1.0 to provide a web-endpoint for prometheus to store some metrics.

I added Plug to my root supervisor like

      children = [
      Plug.Adapters.Cowboy.child_spec(:http, ProjectRoot.Metrics, [], [port: 9100]),

It worked nicely but after a couple of hours (handling one request every minute) it stopped accepting connections. The metrics endpoint is doing 3 sql queries counting stuff and returning the result. The whole app is deployed with docker/kubernetes.

I don’t understand why plug died (did it?). Isn’t the supervisor supposed to restart it?

I am not sure how to approach that issue, never had such an issue. Help or ideas are highly appreciated

If it’s that slow, maybe turn on SASL reporting option in the Elixir Logger configuration? It is very noisy but it would tell you the life and death of everything abnormal (and more) in the OTP supervision tree.

Otherwise, have you connected to the system while it is in that state, tried running traces or hooking up with observer remotely or so?

1 Like

Thx for the pointers.

I have never connected to the running process with iex. Have to look up how that has to be done.

It would be interesting if the plug process is still up and running or not.

What I did is updating the versions of cowboy and plug to that what I use in our Phoenix app. Hope that improves the situation.

1 Like