Mobius - mini timeseries database, esp for telemetry (more RRD than Influx)

the_wildgoose · March 4, 2022, 4:48pm

I discovered a rather awesome library that behaves something like a mini RRD timeseries database, especially for telemetry metrics. It’s named “Mobius”, but the author seems rather shy to promote it, but I found it a gamechanger, so want to give it a quick plug!

A little story I think explains why I think this is cool: I am aware that Elixir has support for “telemetry” to generate metrics and the like. I buy why adding support for this to my application is useful, but … As I don’t have an infrastructure set up to monitor/aggregate/graph these metrics, I’ve never really bothered to implement any… What can they do for me? Perhaps I write some code to investigate a performance problem and then forget about it later…

So that’s where we come in, and then I discovered the Mobius library. Essentially it builds a simple RRD database, storing the last X seconds of measurements, which roll to become the last Y minutes, then the last Z hours, and so on. It also has a basic graphing library which works in the console (wow!), and also you can export the data to graph in Livebook, etc

It’s hard to describe the “oh my” moment, but suddenly being able to visualise a bunch of metrics was a game changer! My app deals with a cellular modem and byte counters and signal strengths and the ability to suddenly get a quick graph of the last few minutes or last few hours of usage was a wow moment. It’s not that I couldn’t get that data and graph it before, but the amount of friction was high.

Now that I can quickly visualise some statistics I’m sticking telemetry everywhere I can! Previously I might have wondered how fast some function took to execute and if it really worried me I might have built a benchmark and then forgotten about it later is the performance seemed ok. Now I can just add some telemetry to any function that seems interesting and then poke at it later! Very probably I will also use this to expose some data to the end user in our app as well (think spark lines, etc)

I found this through a video from last year’s ElixirConf here:
ElixirConf 2021 - Matt Ludwigs - Metrics in the Small - Telemetry for Nerves Devices - YouTube

I’ve not seen any coverage of it here, and I think the reason is because it’s not been pitched as a mini Timeseries/Grafana. However, for my use case this is exactly what it’s replacing - sure it’s not got anything like the functionality of Influx, but for getting off ground zero and making metrics useful, I claim this does more than 50% of what I could do with the big guns (plus it’s built into Elixir and could be used to expose these metrics to a front end app.

The library lives here:

GitHub - mattludwigs/mobius: Library for localized telemetry metrics reporting

However, I have added quite a few enhancements here (pull request sent to author):

GitHub - ewildgoose/mobius: Library for localized telemetry metrics reporting

I can provide some examples of plotting in LiveBook if anyone wants to see that?

Enjoy!

dimitarvp · March 5, 2022, 1:25am

By all means, do the Livebook demo! I’m curious and I think many others will be.

crova · March 5, 2022, 2:59am

This resonate with me so much!
You made an awesome job on getting me hyped to check this library out.
Thanks for sharing your findings.

And I second @dimitarvp, please do the demo about livebook, I would be very much interested in it.

the_wildgoose · March 7, 2022, 1:17pm

OK, so the first thing that wasn’t clear to me regarding Metrics is that when you see stuff like this in your phoenix app, you kind of assume it’s actually doing something:

  @impl true
  def init(_arg) do
    children = [
      # Telemetry poller will execute the given period measurements
      # every 10_000ms. Learn more here: https://hexdocs.pm/telemetry_metrics
      {:telemetry_poller, measurements: periodic_measurements(), period: 10_000}
      # Add reporters as children of your supervision tree.
      # {Telemetry.Metrics.ConsoleReporter, metrics: metrics()}
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end

  def metrics do
    [
      # Phoenix Metrics
      summary("phoenix.endpoint.stop.duration",
        unit: {:native, :millisecond}
      ),
      summary("phoenix.router_dispatch.stop.duration",
        tags: [:route],
        unit: {:native, :millisecond}
      ),

...
      # VM Metrics
      summary("vm.memory.total", unit: {:byte, :kilobyte}),
      summary("vm.total_run_queue_lengths.total"),
      summary("vm.total_run_queue_lengths.cpu"),
      summary("vm.total_run_queue_lengths.io")
    ]
  end

However, that code is basically a noop. The Telemetry.Metrics library exposes functions such as last_value and summary, but these are just DSL functions that give a formal language to describe things you are interested to monitor. The main thing which is really happening is that you create a function or variable (in the above function metrics/0), which you then feed into some kind of reporter, and it’s that which does the work. In the phoenix example there is no work being done…! The module exists only as a way to pass parameters to the phoenix dashboard stuff!

So what’s a “reporter”, well it’s just something which takes in a DSL like the above and then “does something” with the metrics you gave it. So really the metrics function or variable is just a way of defining a complex set of params in a unified way

So there is a built in reporter which just dumps the any new metric to the console (useful for debugging, but I don’t think so much else?). There are other reporters to dump stuff into influx, etc and phoenix has it’s own reporter which listens to that metrics list and generates some graphs (but without any history as it’s just listening live to new things coming in)

So Mobius is a “reporter”, which stores those listened for metrics in an RRD alike db and will give them back to you on demand. It’s storing a configurable number of seconds, minutes, hours of data and old data basically rolls off the end (check out RRD to see the basic idea)

So we could ask Mobius to start monitoring some metrics like this (basically similarly to the above, but we add our actual reporter, ie mobius to the supervisor to make something start happening)

defmodule SysData.Telemetry do
  use Supervisor
  require Logger
  import Telemetry.Metrics

  @persistence_dir "/tmp"

  def start_link(arg) do
    Supervisor.start_link(__MODULE__, arg, name: __MODULE__)
  end

  @impl true
  def init(_arg) do
    mobius_metrics = vm_metrics() ++ mobius_metrics() ++ net_mgr_metrics()

    children = [
      # Add reporters as children of your supervision tree.
      {Mobius,
       metrics: mobius_metrics, persistence_dir: @persistence_dir, autosave_interval: 60 * 5}
    ]

    Supervisor.init(children, strategy: :one_for_one)
  end

  def vm_metrics do
    [
      # VM Metrics
      last_value("vm.memory.total", unit: {:byte, :kilobyte}),
      last_value("vm.total_run_queue_lengths.total"),
      last_value("vm.total_run_queue_lengths.cpu"),
      last_value("vm.total_run_queue_lengths.io")
    ]
  end

  def mobius_metrics do
    [
      last_value("mobius.save.stop.duration", unit: {:native, :millisecond}, tags: [:name]),
      last_value("mobius.filter.stop.duration",
        unit: {:native, :millisecond},
        tags: [:metric_name]
      )
    ]
  end

  def net_mgr_metrics do
    [
      # NetMgr
      last_value("net_mgr.net_dev.interface.signal.signal_strength", tags: [:interface]),
      last_value("net_mgr.net_dev.interface.signal.signal_bars", tags: [:interface]),
      last_value("net_mgr.net_dev.interface.counters.bytes_in", tags: [:interface]),
      last_value("net_mgr.net_dev.interface.counters.bytes_out", tags: [:interface]),
      last_value("net_mgr.net_dev.interface.counters.bytes_total", tags: [:interface])
    ]
  end
end

The above has some statistics which won’t be available for you, ie the net_mgr stuff, as this is something from my app, but notice how you can easily build up a ton of stats and even share some of these definitions with the phoenix telemetry viewer if you wish. ie the definitions are just that, definitions of stuff you would like. They don’t actually cause anything to happen, you use them as parameters into other systems that might do the work

OK, so I’m using my fork of Mobius here as the upstream doesn’t have some of these features (autosave, dump data for Vega, etc)

Mobius can plot your data in the iex console with something like:

iex> Mobius.plot("net_mgr.net_dev.interface.counters.bytes_total", %{interface: "wan1"})
                Metric Name: net_mgr.net_dev.interface.counters.bytes_total, Tags: %{interface: "wan1"}

8607221048.00 ┤
8607216362.92 ┤                                                                                                                        ╭─
8607211677.83 ┤                                                                                                                        │
8607206992.75 ┤                                                                                                                        │
8607202307.67 ┤                                                                                                                        │
8607197622.58 ┤                                                                                                                        │
8607192937.50 ┤                                                                                                                        │
8607188252.42 ┤                                                                                                                   ╭────╯
8607183567.33 ┤                                                                                                                   │
8607178882.25 ┤                                                                                                                   │
8607174197.17 ┤                                                                                          ╭────────────────────────╯
8607169512.08 ┤                                                       ╭──────────────────────────────────╯
8607164827.00 ┼───────────────────────────────────────────────────────╯

:ok:

However, if you wanted the raw data, then use the (not in upstream) function Mobius.filter_metrics()

This could be used in a livebook to get the data for plotting, eg:

data = Mobius.filter_metrics("net_mgr.net_dev.interface.signal.signal_strength", %{interface: "ppp10"})

Some sample data so you can follow along at home would be:

data = [
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_940, type: :last_value, value: -105},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_985, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_986, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_987, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_988, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_989, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_990, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_991, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_992, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_993, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_994, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_995, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_996, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_997, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_998, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_121_999, type: :last_value, value: -111},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_000, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_001, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_002, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_003, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_004, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_005, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_006, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_007, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_008, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_009, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_010, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_011, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_012, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_013, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_014, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_015, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_016, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_017, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_018, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_019, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_020, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_021, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_022, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_023, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_024, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_025, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_026, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_027, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_028, type: :last_value, value: -107},
  %{tags: %{interface: "ppp10"}, timestamp: 1_645_122_029, type: :last_value, value: -107}
]

You need to have the data in millisecs for vegalite, however, the source is unix timestamps in seconds (I suspect this can be done with a filter to vegalite, but I didn’t figure out how?). So plot it in a Livebook cell like this:

data = data |> Enum.map(fn point -> %{point | timestamp: point.timestamp * 1_000} end )

Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:line)
|> Vl.encode_field(:x, "timestamp",
  type: :temporal,
  time_unit: :dayshoursminutesseconds,
  scale: [type: :utc]
)
|> Vl.encode_field(:y, "value",
  type: :quantitative,
  scale: [zero: false])
|> Vl.encode_field(:color, "tags.interface", type: :nominal)
|> Kino.VegaLite.new()

Note you don’t need all that junk above, I was just fiddling with learning how to pretty up the graphs to make them plot nicely to my eye.

I’m not sure how to insert a PNG into elixirforum, but if you try the above in a livebook you should get a nice graph which looks not unlike the console graph above. Obviously build from there!

Note that the filter/plot functions take parameters to give you the data in seconds, minutes, hours, etc. So for me this was mind blowing that I could notice something gone strange and then ask for say bytes over the last few hours, or check recent signal strength to see if it correlated with some dropped call. Or in the example above I wanted to check how much CPU the filter function was actually taking, so I instrumented Mobius to track that and then used mobius to track it’s own function call timings so that I could plot them and check they are reasonable! How meta is that!

For me this is a gamechanger and I’m now instrumenting everything I can, as it’s fun and useful to plot this stuff. How long is my modem taking to wake up? How many dropped calls? How long are certain function calls taking? Can we narrow that down by parameters to spot a trend? Obviously you can do all this and more if you shove the data into something you can plot with grafana, but I didn’t have that stuff handy and Mobius is so useful for just checking stuff from the console!

Have fun!

mattludwigs · March 7, 2022, 3:26pm

@the_wildgoose I am curious as to the context you’re using Mobius? Is it with Nerves, embedded, or something else?

If you’re working with modems my team and I have developed a library called VintageNet with libraries for both PPP and QMI.

We have been messing around with mobile connectivity metrics in the QMI library, but we haven’t finalized the API yet. Anyways, figured I would put that on your radar as we will probably be trying finalize some metrics in those libraries if you were interested.

the_wildgoose · March 8, 2022, 6:46pm

I’m using it right now in an embedded router that my company builds. We support satellite, LTE, wifi links and more. Everything is discovered dynamically, so you can plug in a couple of LTE modems, and say a satellite link and all the routing and firewalling is dynamically adjusted

Over the last couple of years I’ve rebuilt all the main components using Elixir (plus some lua/sh for glue). Broadly I would say it’s been a massive success. Elixir is fast enough to work well on low powered hardware (one device we support is a single core 500Mhz thing with 256MB of ram). However, the ability to not have to worry about multi-threading in elixir is fantastic. I would definitely commend elixir for many embedded type projects (by “embedded” I’m not thinking ESP32/PIC, but more low end traditional compute devices)

I desperately need a tiny timeseries database for simple use cases (what’s my average signal strength over the last 24 hours, what’s my byte counter look like over the last 60 seconds, etc). InfluxDB is quite “big”, and has dropped 32 bit support now. DuckDB potentially looks interesting if someone were to produce an interface for it. QuestDB is … TimescaleDB needs too much schema defining and compression performance doesn’t look good enough for tiny databases. VictoriaMetrics looks most promising so far, however, x86 32bit support is a bit iffy. There is no query wrapper that I’m aware of, but it consumes metrics in influxdb, statsd and other formats for which there are elixir libs.

So actually, I think Mobius could take care of a few of these basic use cases

Thanks for creating it!

dimitarvp · March 9, 2022, 2:26am

Always wanted to work on something like this! Do you have an article on the company’s website detailing how does this automatic process work?

the_wildgoose · March 9, 2022, 7:00pm

We are hiring! See:

I think shoot me a message and we can discuss offline. I don’t want to derail this thread

Exadra37 · April 15, 2022, 12:47am

The link goes to the homepage not to a careers page or similar.

Exadra37 · April 15, 2022, 12:56am

I was curious about this RRD database and this article seems to do a good work on explaining it:

RRD store the consolidated values in Round Robin Archives

Data values of the same consolidation setup are stored into Round Robin Archives (RRA). This is a very efficient manner to store data for a certain amount of time, while using a known amount of storage space.
It works like this: If you want to store 1000 values in 5 minute interval, RRDTool will allocate space for 1000 data values and a header area. In the header it will store a pointer telling which one of the values in the storage area was last written to. New values are written to the Round Robin Archive in a … you guess it … round robin manner. This automatically limits the history to the last 1000 values. Because you can define several RRAs within a single RRD, you can setup another one, storing 750 data values at a 2 hour interval and thus keeping a log for the last two months although at a lower resolution.

The use of RRAs guarantees that the RRD does not grow over time and that old data is automatically eliminated. By using the consolidation feature, you can still keep data for a very long time, while gradually reducing the resolution of the data along the time axis. Using different consolidation functions (CF) allows you to store exactly the type of information that actually interests you. (Maximum one minute traffic on the LAN, minimum temperature of the wine cellar, total minutes down time …)

There are 4 type of consolidation functions

AVERAGE	Average	Take the arithmetic average of the collected values
LAST	Last read value	Take the last collected value
MIN	Minimum read value	Take the smallest collected value
MAX	Maximum read value	Take the highest collected value

dimitarvp · April 22, 2022, 8:26pm

@the_wildgoose I can see you have 7 approved and merged PRs into the main repo so your repo now seems behind / no longer relevant. Can you confirm that we can use your enhanced features from the mattludwigs/mobius repo?

the_wildgoose · June 13, 2022, 5:50pm

Hi, Sorry, dropped out of circulation for a little while. Recently upstream mobius accepted my remaining feature suggestions, so everything is now in the mattludwigs/mobius repo!

Recent suggestions added are some enhancements to allow plotting/extracting summary data (average, max, min, etc), and also adding a form of std deviation to the summary data. To be honest, I’m still not totally happy with that part yet, I think we probably need some kind of “rolling summary” options, but it’s a good start anyway

It’s a really cool tool anyway. It gives some purpose to generating metrics in the case that you don’t have some big processing system - it also gives a simple way to have a quick look at some metrics on a real system (just add a few lines to a module, wait a while, then drop to the command line and do some experiments!). So great for IOT, etc