Increasing resource usage with Phoenix

Phillipp · June 12, 2019, 10:27am

Hey, I am using Phoenix for a still quite small app. All the app does is getting some input, then starting a bunch of processes per entity and after a certain time, stopping these processes again (DynamicSupervisor).

We are not live yet, so the system is most of the time in “idle” mode, except a few times a week when the customer is testing some stuff.

I am monitoring the application with Prometheus and noticed some increase in resource usage over time. You can see the BEAM dashboard in the provided screenshot. It shows the last 7 days. The resource usage goes up over time. I redeployed one time, that’s when the usage goes straight down.

For the memory part, it is mostly process memory that increases. It cannot be my processes since they get cleaned up if an entity is not “tracked” anymore.
It’s weird that the IO, GC and Load also increases.

While typing this, I may have an idea of what it could be. The Prometheus metrics are filling up over time, for example when bots hit URLs that don’t exist. But the Prometheus package stores the data in ETS and the ETS memory stays at around 1,9mb the whole time while process memory goes from 12mb to 33mb.

Just noticed that the process memory dropped to around 15mb overnight (screenshot is from yesterday), while all other metrics stayed high.

I am really confused about what’s going on here.

Quick summary of my setup:

Phoenix 1.4
Using DynamicSupervisor to start and terminate “tracking” processes
No real load on the system yet
Hackney pool size of 1000
prometheus_ex package for collecting metrics

outlog · June 12, 2019, 10:39am

is this running as a “release” eg distillery etc ?

also I notice reductions are up… so I assume “work in progress” memory usage is also up - eg. you are processing more data/it’s taking longer time - so memory is released at a slower pace…

Phillipp · June 12, 2019, 10:44am

Oh sorry, forgot to say that I am using mix phx.server for now.

My part of the system (the DynamicSupervisor and friends) aren’t doing anything most of the time since the system is used very rarely at the moment.

michalmuskala · June 12, 2019, 10:49am

The graphs indicate the memory is primarily allocated to processes - have you explored which processes use that memory? You could obtain the top offenders either with observer or some other tool (with recon, e.g. :recon.proc_count(:memory, 10))

Phillipp · June 12, 2019, 10:52am

Since I am using mix phx.server inside a Docker container, I cannot execute code in the context of the running application right now. I do plan to switch to releases tho. But it will take some days to have it filled up again before I can inspect it.

michalmuskala · June 12, 2019, 10:59am

Debugging a system without a way to inspect that system is kind of a lost cause

You could potentially try some of the web-based observer replacements, but YMMV.

Phillipp · June 12, 2019, 11:01am

Haha, I know. I was just too lazy to setup the proper release. The project started as a technical POC to show its benefits and now I am preparing it for the real world.

EDIT: I just started setting up Distillery. Then in a few days I can investigate what’s going on.

Phillipp · June 12, 2019, 12:43pm

Building the release locally works fine, but building it inside a container using bitwalker/alpine-elixir:1.8.1 errors with the following:

==> Generating start_erl.data
==> Generating vm.args from rel/vm.args
==> Generating sys.config from config/config.exs
==> Including ERTS 10.3.5 from /usr/lib/erlang/erts-10.3.5
==> Generating boot scripts
==> Release failed, during .boot generation:
        Undefined applications: [observer,wx]

I am mainly following https://hexdocs.pm/distillery/guides/working_with_docker.html with some adjustments.

NobbZ · June 12, 2019, 1:11pm

As docker usually doesn’t have a graphical display server running, most erlang images (especially those that target small image sizes) strip out wx and dependants. So you either need to install a version of erlang that includes wx and other missing applications, choose another base image that has them already, or drop them from your :extra_applications.

Phillipp · June 12, 2019, 1:13pm

Ah, I understand. I dropped them for now. Locally I am using iex anyway. Just thought it won’t hurt to have it in the release too.

Phillipp · June 24, 2019, 12:25pm

So, having it running for quite some time now, I saw memory usage go up and then suddenly go down again. GC goes a little bit crazy but I don’t think that I should worry about it. IO goes up over time but I guess that’s just the /metrics endpoint serving more stuff.

What I do not understand is that the Reductions keep increasing.

For the memory part, this are the top 10 processes. Nothing special there:

1> :recon.proc_count(:memory, 10)
[
  {#PID<0.1521.0>, 4119660,
   [
     :code_server,
     {:current_function, {:code_server, :loop, 1}},
     {:initial_call, {:erlang, :apply, 2}}
   ]},
  {#PID<0.1520.0>, 1577780,
   [
     :kernel_sup,
     {:current_function, {:gen_server, :loop, 7}},
     {:initial_call, {:proc_lib, :init_p, 5}}
   ]},
  {#PID<0.1741.0>, 1159136,
   [
     current_function: {:gen_server, :loop, 7},
     initial_call: {:proc_lib, :init_p, 5}
   ]},
  {#PID<0.9.0>, 689524,
   [
     :erl_prim_loader,
     {:current_function, {:erl_prim_loader, :loop, 3}},
     {:initial_call, {:erlang, :apply, 2}}
   ]},
  {#PID<0.1683.0>, 426596,
   [
     :prometheus_sup,
     {:current_function, {:gen_server, :loop, 7}},
     {:initial_call, {:proc_lib, :init_p, 5}}
   ]},
  {#PID<0.1514.0>, 319328,
   [
     :application_controller,
     {:current_function, {:gen_server, :loop, 7}},
     {:initial_call, {:erlang, :apply, 2}}
   ]},
  {#PID<0.1692.0>, 88620,
   [
     :release_handler,
     {:current_function, {:gen_server, :loop, 7}},
     {:initial_call, {:proc_lib, :init_p, 5}}
   ]},
  {#PID<0.1677.0>, 29628,
   [
     :hackney_manager,
     {:current_function, {:gen_server, :loop, 7}},
     {:initial_call, {:proc_lib, :init_p, 5}}
   ]},
  {#PID<0.2790.0>, 26656,
   [
     :inet_gethost_native,
     {:current_function, {:inet_gethost_native, :main_loop, 1}},
     {:initial_call, {:inet_gethost_native, :server_init, 2}}
   ]},
  {#PID<0.1611.0>, 26580,
   [
     :tls_connection_sup,
     {:current_function, {:gen_server, :loop, 7}},
     {:initial_call, {:proc_lib, :init_p, 5}}
   ]}
]

OvermindDL1 · June 24, 2019, 8:44pm

The reductions is just the amount of times a function was called in a process, so it should always be climbing on long lived processes.

Phillipp · June 24, 2019, 9:09pm

But in my case, the tracking processes are started and later stopped. In the screenshot above, on the Processes graph, there you can see some spikes. That’s when tracking processes were started. But the Reductions increase all the time and the graph doesn’t show the total amount of recorded reductions but uses irate(). So the amount of active reductions also keeps increasing. It actually has very close similarities to the IO and Bytes Reclaimed graphs.

I wonder if it’s really just the metrics endpoint that fills up with more and more data (due to bots hitting 404 routes every now and then) and therefore creates load.