Capturing and Evaluating Statistics Data During Runitme

I would like to capture and persist meta data of my data processing pipeline for later evaluation.
Currently I have a GenServer which saves all data (numeric and datetimes) in lists and dictionaries, which is getting quite complex.
I would like to avoid a full-blown database.

Sample data:

%{
  326 => %{
    capture: %{time: ~N[2022-11-23 08:34:27.458010]},
    compress: %{time: ~N[2022-11-23 08:34:27.795463]},
    delay: %{time: ~N[2022-11-23 08:34:27.853766]},
    package: %{size: 143, time: ~N[2022-11-23 08:34:27.921587]}
  },
  33 => %{
    capture: %{time: ~N[2022-11-23 08:34:07.559567]},
    compress: %{time: ~N[2022-11-23 08:34:07.898304]},
    delay: %{time: ~N[2022-11-23 08:34:07.937860]},
    package: %{size: 157, time: ~N[2022-11-23 08:34:08.005702]}
  },
...

Is there a library you can recommend for that?
I am thinking of something like Explorers DataFrames. Unfortunately they are static and rows cannot be added easily.

I’ve gone down this route before and between the desire to ‘persist’ and the desire to ‘query without writing a lot of code’ I’ve ended up at full-blown databases with hobby projects, and wishing I had started there. Wanting to do full-blown data processing is yet another reason to not avoid a traditional db.

That all being said, perhaps GitHub - lucaong/cubdb: Elixir embedded key/value database is a decent middle ground? The processing approach would probably look something like


CubDB.select(db)
|> Stream.map(...)


which could integrate with Flow with relative ease?

3 Likes

A good first thing to try is ETS - the guide has a solid Elixir-oriented introduction to it. It isn’t perfect for every situation, but even learning “why ETS won’t do what I want” will help inform your search.

One common characteristic of “not a full-blown DB” solutions is that some operations are easier than others, so choosing one is often about “what do I need to do” versus “what do I need to store”. Can you tell us more about how you plan to use this data?

1 Like

@jerdew I’ll check out CubDB! Thanks for your insight, maybe I’ll use a db for now and if I have problems at some point I might just try something else.

@al2o3cr It’s essentially an IoT application which will run on battery for some time (let’s say 2h at a time).
The log data is relevant during runtime to inspect real-time performance, but also post-run, so it needs to be stored.
Interesting would be general statistics (rolling averages, outliers, etc.). I suspect the amount of metadata increasing over time (>300 different measurements per sec). Alongside the metadata some binary data (ca 1Mb/s) will also be stored.
I’ll look at ETS as well, thanks for your input!

For this sort of data have you looked at Prometheus or Victoria Metrics? There are a number Elixir libraries for it. Then maybe Grafana for analysis.

1 Like

I’ve been using sqlite a lot for my hobby projects recently, works just great for a lot of stuff. :slight_smile:

1 Like