I would like to capture and persist meta data of my data processing pipeline for later evaluation.
Currently I have a GenServer which saves all data (numeric and datetimes) in lists and dictionaries, which is getting quite complex.
I would like to avoid a full-blown database.
Is there a library you can recommend for that?
I am thinking of something like Explorers DataFrames. Unfortunately they are static and rows cannot be added easily.
Iâve gone down this route before and between the desire to âpersistâ and the desire to âquery without writing a lot of codeâ Iâve ended up at full-blown databases with hobby projects, and wishing I had started there. Wanting to do full-blown data processing is yet another reason to not avoid a traditional db.
A good first thing to try is ETS - the guide has a solid Elixir-oriented introduction to it. It isnât perfect for every situation, but even learning âwhy ETS wonât do what I wantâ will help inform your search.
One common characteristic of ânot a full-blown DBâ solutions is that some operations are easier than others, so choosing one is often about âwhat do I need to doâ versus âwhat do I need to storeâ. Can you tell us more about how you plan to use this data?
@jerdew Iâll check out CubDB! Thanks for your insight, maybe Iâll use a db for now and if I have problems at some point I might just try something else.
@al2o3cr Itâs essentially an IoT application which will run on battery for some time (letâs say 2h at a time).
The log data is relevant during runtime to inspect real-time performance, but also post-run, so it needs to be stored.
Interesting would be general statistics (rolling averages, outliers, etc.). I suspect the amount of metadata increasing over time (>300 different measurements per sec). Alongside the metadata some binary data (ca 1Mb/s) will also be stored.
Iâll look at ETS as well, thanks for your input!