OK, so the first thing that wasn’t clear to me regarding Metrics is that when you see stuff like this in your phoenix app, you kind of assume it’s actually doing something:
@impl true
def init(_arg) do
children = [
# Telemetry poller will execute the given period measurements
# every 10_000ms. Learn more here: https://hexdocs.pm/telemetry_metrics
{:telemetry_poller, measurements: periodic_measurements(), period: 10_000}
# Add reporters as children of your supervision tree.
# {Telemetry.Metrics.ConsoleReporter, metrics: metrics()}
]
Supervisor.init(children, strategy: :one_for_one)
end
def metrics do
[
# Phoenix Metrics
summary("phoenix.endpoint.stop.duration",
unit: {:native, :millisecond}
),
summary("phoenix.router_dispatch.stop.duration",
tags: [:route],
unit: {:native, :millisecond}
),
...
# VM Metrics
summary("vm.memory.total", unit: {:byte, :kilobyte}),
summary("vm.total_run_queue_lengths.total"),
summary("vm.total_run_queue_lengths.cpu"),
summary("vm.total_run_queue_lengths.io")
]
end
However, that code is basically a noop. The Telemetry.Metrics library exposes functions such as last_value
and summary
, but these are just DSL functions that give a formal language to describe things you are interested to monitor. The main thing which is really happening is that you create a function or variable (in the above function metrics/0), which you then feed into some kind of reporter, and it’s that which does the work. In the phoenix example there is no work being done…! The module exists only as a way to pass parameters to the phoenix dashboard stuff!
So what’s a “reporter”, well it’s just something which takes in a DSL like the above and then “does something” with the metrics you gave it. So really the metrics function or variable is just a way of defining a complex set of params in a unified way
So there is a built in reporter which just dumps the any new metric to the console (useful for debugging, but I don’t think so much else?). There are other reporters to dump stuff into influx, etc and phoenix has it’s own reporter which listens to that metrics list and generates some graphs (but without any history as it’s just listening live to new things coming in)
So Mobius is a “reporter”, which stores those listened for metrics in an RRD alike db and will give them back to you on demand. It’s storing a configurable number of seconds, minutes, hours of data and old data basically rolls off the end (check out RRD to see the basic idea)
So we could ask Mobius to start monitoring some metrics like this (basically similarly to the above, but we add our actual reporter, ie mobius to the supervisor to make something start happening)
defmodule SysData.Telemetry do
use Supervisor
require Logger
import Telemetry.Metrics
@persistence_dir "/tmp"
def start_link(arg) do
Supervisor.start_link(__MODULE__, arg, name: __MODULE__)
end
@impl true
def init(_arg) do
mobius_metrics = vm_metrics() ++ mobius_metrics() ++ net_mgr_metrics()
children = [
# Add reporters as children of your supervision tree.
{Mobius,
metrics: mobius_metrics, persistence_dir: @persistence_dir, autosave_interval: 60 * 5}
]
Supervisor.init(children, strategy: :one_for_one)
end
def vm_metrics do
[
# VM Metrics
last_value("vm.memory.total", unit: {:byte, :kilobyte}),
last_value("vm.total_run_queue_lengths.total"),
last_value("vm.total_run_queue_lengths.cpu"),
last_value("vm.total_run_queue_lengths.io")
]
end
def mobius_metrics do
[
last_value("mobius.save.stop.duration", unit: {:native, :millisecond}, tags: [:name]),
last_value("mobius.filter.stop.duration",
unit: {:native, :millisecond},
tags: [:metric_name]
)
]
end
def net_mgr_metrics do
[
# NetMgr
last_value("net_mgr.net_dev.interface.signal.signal_strength", tags: [:interface]),
last_value("net_mgr.net_dev.interface.signal.signal_bars", tags: [:interface]),
last_value("net_mgr.net_dev.interface.counters.bytes_in", tags: [:interface]),
last_value("net_mgr.net_dev.interface.counters.bytes_out", tags: [:interface]),
last_value("net_mgr.net_dev.interface.counters.bytes_total", tags: [:interface])
]
end
end
The above has some statistics which won’t be available for you, ie the net_mgr stuff, as this is something from my app, but notice how you can easily build up a ton of stats and even share some of these definitions with the phoenix telemetry viewer if you wish. ie the definitions are just that, definitions of stuff you would like. They don’t actually cause anything to happen, you use them as parameters into other systems that might do the work
OK, so I’m using my fork of Mobius here as the upstream doesn’t have some of these features (autosave, dump data for Vega, etc)
Mobius can plot your data in the iex console with something like:
iex> Mobius.plot("net_mgr.net_dev.interface.counters.bytes_total", %{interface: "wan1"})
Metric Name: net_mgr.net_dev.interface.counters.bytes_total, Tags: %{interface: "wan1"}
8607221048.00 ┤
8607216362.92 ┤ ╭─
8607211677.83 ┤ │
8607206992.75 ┤ │
8607202307.67 ┤ │
8607197622.58 ┤ │
8607192937.50 ┤ │
8607188252.42 ┤ ╭────╯
8607183567.33 ┤ │
8607178882.25 ┤ │
8607174197.17 ┤ ╭────────────────────────╯
8607169512.08 ┤ ╭──────────────────────────────────╯
8607164827.00 ┼───────────────────────────────────────────────────────╯
:ok:
However, if you wanted the raw data, then use the (not in upstream) function Mobius.filter_metrics()
This could be used in a livebook to get the data for plotting, eg:
data = Mobius.filter_metrics("net_mgr.net_dev.interface.signal.signal_strength", %{interface: "ppp10"})
Some sample data so you can follow along at home would be:
data = [
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_940, type: :last_value, value: -105},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_985, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_986, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_987, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_988, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_989, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_990, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_991, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_992, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_993, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_994, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_995, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_996, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_997, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_998, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_121_999, type: :last_value, value: -111},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_000, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_001, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_002, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_003, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_004, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_005, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_006, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_007, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_008, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_009, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_010, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_011, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_012, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_013, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_014, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_015, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_016, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_017, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_018, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_019, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_020, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_021, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_022, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_023, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_024, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_025, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_026, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_027, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_028, type: :last_value, value: -107},
%{tags: %{interface: "ppp10"}, timestamp: 1_645_122_029, type: :last_value, value: -107}
]
You need to have the data in millisecs for vegalite, however, the source is unix timestamps in seconds (I suspect this can be done with a filter to vegalite, but I didn’t figure out how?). So plot it in a Livebook cell like this:
data = data |> Enum.map(fn point -> %{point | timestamp: point.timestamp * 1_000} end )
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:line)
|> Vl.encode_field(:x, "timestamp",
type: :temporal,
time_unit: :dayshoursminutesseconds,
scale: [type: :utc]
)
|> Vl.encode_field(:y, "value",
type: :quantitative,
scale: [zero: false])
|> Vl.encode_field(:color, "tags.interface", type: :nominal)
|> Kino.VegaLite.new()
Note you don’t need all that junk above, I was just fiddling with learning how to pretty up the graphs to make them plot nicely to my eye.
I’m not sure how to insert a PNG into elixirforum, but if you try the above in a livebook you should get a nice graph which looks not unlike the console graph above. Obviously build from there!
Note that the filter/plot functions take parameters to give you the data in seconds, minutes, hours, etc. So for me this was mind blowing that I could notice something gone strange and then ask for say bytes over the last few hours, or check recent signal strength to see if it correlated with some dropped call. Or in the example above I wanted to check how much CPU the filter function was actually taking, so I instrumented Mobius to track that and then used mobius to track it’s own function call timings so that I could plot them and check they are reasonable! How meta is that!
For me this is a gamechanger and I’m now instrumenting everything I can, as it’s fun and useful to plot this stuff. How long is my modem taking to wake up? How many dropped calls? How long are certain function calls taking? Can we narrow that down by parameters to spot a trend? Obviously you can do all this and more if you shove the data into something you can plot with grafana, but I didn’t have that stuff handy and Mobius is so useful for just checking stuff from the console!
Have fun!