I’m trying to get started exporting custom app metrics to prometheus on Fly. This is my first time using any Telemetry packages in Elixir, so any tips or suggestions on how to improve this are welcome.
I’m using the Telemetry.Metrics package (default with Phoenix, I think?) and telemetry_metrics_prometheus
to export the metrics. I think the latter was necessary to have an endpoint that prometheus could scrape.
My application has a custom cache and I am trying to measure the hits vs misses. So here’s how I have MyApplicationWeb.Telemetry.metrics/0
:
def metrics() do
[
# a ton of default `summary`s that came out of the box,
last_value("vm.memory.total", unit: {:byte, :kilobyte}), # added for testing
counter("my_application.some_mod.cache.hit"),
counter("my_application.some_mod.cache.miss")
]
end
My application.ex
has my Telemetry module and TelemetryMetricsPrometheus
in the children
list for its Supervisor
:
children = [
MyApplicationWeb.Telemetry,
{TelemetryMetricsPrometheus, [metrics: MyApplicationWeb.Telemetry.metrics()]},
# Rest of children
]
And the code in the caching module:
def contains?(word) do
with term when term != nil <- Repo.get_by(Term, word: word) do
Logger.debug("cache hit")
:telemetry.execute([:my_application, :some_mod, :cache], %{hit: 1}, %{})
term.is_valid
else
nil ->
Logger.debug("cache miss")
is_valid? = Inclusive.contains?(word)
:telemetry.execute([:my_application, :some_mod, :cache], %{miss: 1}, %{})
add_to_cache(word, is_valid?)
is_valid?
end
end
I have tested this. The metrics seem to work in the Phoenix Live Dashboard as expected correctly, but when I hit the /metrics
endpoint provided by TelemetryMetricsPrometheus
the hit
and miss
counters always increment in lockstep. I can read the logs and verify there have been no cache hits, but hit will always go up at the same time as misses.
Have I configured these metrics correctly? Am I using :telemetry.execute
properly? I didn’t find the docs for these packages to be very clear, sadly.
Here are some examples of the graphs of my metrics in the dashboard:
at the same time if I scrape the /metrics
endpoint, it looks like the counters are being summed?
my_application_some_mod_cache_miss 8
my_application_some_mod_cache_hit 8
vm_memory_total 79445.464