I’m working on a high-availability real-time application. Currently, it’s composed of a number of “bots”, each of which listens on WebSocket endpoints for data, responds to the data in various ways, and stores important events in CSV files. The system works great so far, and is able to run all day without problems.
I’d like to add the ability to compute statistics for these events, as they come in, to determine the performance of each “bot”.
I also have another, totally unrelated Phoenix web application which relies on very similar statistics. It stores the time series events in a Postgres database and re-calculates the needed statistics whenever someone visits a page. There aren’t any performance issues there – yet.
Ideally I’d like both of these applications to rely on the same solution. I don’t believe the re-calculation is the most appropriate solution for the real-time app, though…please correct me if I’m wrong!
I’ve read through the post on Creating Persistent Real-Time Analytics of Time Series Data, but that post is a few years old, plus my use case is much less intense – for now.
Here are the important data points for the real-time app:
- fewer than 100 events per day
- fewer than 20 bots running at a given time
- the calculations themselves aren’t computationally intense (for now)
Here are the features I’d like the statistics-tracking application to have:
- persistent: if the system crashes, I should be able to load in the last known stats or compute the stats from events generated so far.
- lightweight: I’d like this to be easy to include in other projects that deal with similar data, so I’d like to keep dependencies to a minimum. I’m not opposed to using a database, though, if it proves to be the most appropriate solution.
- configurable: I’d like it to be relatively easy to support new statistics for the system as needed. I’d also like to be able to choose a subset of the statistics to track if I want.
Right now, my idea is to use a Supervisor with an Agent for each statistic I want to calculate and track. I’m a bit lost when it comes to the persistence part, though. I’m also new to time-series calculations and tools in general.
Do you have any suggestions for libraries or built-in tools I can use to solve my problem without too much overkill? Am I overthinking this? Thanks in advance!