Any tips or references for detecting anomalies in timeseries data?

linusdm · July 17, 2022, 1:33pm

This is a question from someone who is mostly ignorant of the possibilities of ML and AI in general, keep that in mind

I’m working on a project where timeseries data is ingested from various sensors (think: temperature, pressure, etc.). One of the core responsibilities of the product is to define specifications for these streams of values, and trigger notifications when things go out of specification. In other words, it’s all about alarms. But the specifications can be somewhat more complex than defining threshold values, as there are strategies to consider a window where n out of N values cannot exceed some threshold for example. Other, rather straightforward rules can also be specified.

Now, I was wondering whether there are ML related strategies that can perform the detection of anomalies in such timeseries data? Maybe even without defining the thresholds (or in addition to a comparison to some thresholds). For example, it would be nice to detect drifting values (which is harder to detect because of slowness of change). Or when there are patterns of temperature change (sometimes the temperature does not stay constant, but it varies according to some pattern during some kind of production process), it would be nice to detect deviations from the patterns that usually occur.

I hope my question is clear. I have high hopes for machine learning and AI in this domain, but I have zero knowledge or intuition of how to tackle or explore the possibilities.

TLDR: any tips or references for detecting anomalies in timeseries data? Is ML even a good fit for such an analysis?

Thanks!

antoine-duchenet · July 17, 2022, 10:46pm

Hi !

This is a vast question!

Yes, ML could be a good fit, however the NFL theorem would tell you that there are no easy shortcuts to success and every tips we could give you will have to be put to the test with your own data!

I mostly work on Natural Language Processing problems thus time series are not really my domain of expertise but I can try to give you some clues:

There are a lot of ML tools able to be used for anomaly / outliers / novelty detection such as SVM, Isolation Forest, LOF, Auto Encoders, Spectral Residual, Seq2Seq… and drift detection may require other solutions.

The only (quite simple) TS problem I’ve been confronted has been solved by a Spectral Residual approach for the outlier detection part and by a Context Aware Maximum Mean Discrepancy (CAMMD) approach for the drift detection part thus if I had to solve another similar problem I would probably give those ones a shot.

As a general advice, even if you do not implement your solution with the Python language, the documentation of scikit-learn may give you some basic understanding of some possible approaches:

The documentation of alibi-detect could give you some keywords to look for too: GitHub - SeldonIO/alibi-detect: Algorithms for outlier, adversarial and drift detection .

I hope this helps!

felix-starman · July 18, 2022, 12:41am

I am in a similar boat but with connected weight scales. I’ve found a white paper from a reputable company, and some PhDs, documenting their approach comparing ARIMA and MAD.

Despite the math being fairly simple, the learning curve regarding terminology and determining what is and isn’t yet available can be a little daunting. I don’t think there’s an autocorrelation analysis function in the libraries, but it’s a little tough to say with certainty as a newbie to this side of things

bdarla · July 18, 2022, 8:30am

Dear Antoine,

This is a side-question in the context of part of your answer. Which Elixir-based tools do you employ for NLP? Do you also “outsource” to Python-based tools to solve NLP problems? In such a case, how to you communicate between Elixir and Python? Have a nice day!

lucaong · July 18, 2022, 10:50am

As others noted, anomaly detection in time series is a huge topic.

Some specific cases are trivial, such as simple thresholds or triggers based on deviation from a norm. Ultimately though, on non-trivial dynamic processes, the definition of an anomaly is very context-dependent.

There are very different algorithms out there, but the basic idea (with wildly different implementations) is that an algorithm would try to predict the next value(s) on the basis of the past, observe the actual value, and measure how far off the prediction was: if it’s very different, it is considered an anomaly. The problem is to choose a prediction model that is sophisticated enough to take into account all known effects: seasonality, patterns, range of frequencies where random oscillations are expected, self-correlation, measurable inputs that affect your system, etc.

The more known effects you add to your prediction, the more your solution will become a custom dynamic model, departing from off-the-shelf algorithms.

It is possible to use ML to learn patterns vs anomalies, but often not easy, due to the fact that anomalies are by definition rare, and do not necessarily look the same. Without proper care to avoid overfitting, if anomalies are rare enough, an ML algorithm could achieve seemingly good results by just labeling everything as “not anomaly”, which is obviously useless. You will therefore need to assess how expensive a false positive is for your case, vs. a false negative.

I know this is not a specific suggestion for an algorithm, and might not be too helpful, but I’d recommend to:

Reason on what you know about the dynamics of your system. Seasonality, frequency range, measurable inputs that can affect your system and might feed into the model.
Assess the cost of false positives vs. false negatives. If false positives are cheap, you can make your algorithm more sensitive, in the attempt to catch all anomalies. Otherwise things get harder. In any case, you’ll need to decide on this important trade off.
How fast should your alerting system react? If it needs to be very fast, it won’t be able to wait and weight in more observations to see if a seemingly strange pattern normalizes (or, equivalently, use longer windows for weighted averages). This makes your alerting more susceptible to noise, and might put limits on the false positive rate.
Start simple, evaluate, and add up: how many anomalies do you catch with a simple threshold model, maybe normalizing seasonalities? What kind of anomalies are missed? If you add thresholds also on the first derivative (growth, as opposed to absolute value), would such anomalies be detected?

In sum, know your system, and split the big problem into separate ones that can each be solved by a known algorithm.

I believe that ML in this case is not a silver bullet, and comes after a proper problem definition. It’s ok to introduce ML if necessary, but deep knowledge of the system, if possible at all, will likely beat ML in this case. Even if you do use ML, you will need to pick an algorithm, prepare its input, and tweak its parameters based on such knowledge of your system.

linusdm · July 25, 2022, 7:19am

Thank you for references and ideas! There is a lot of jargon in there, but I’ll consider it a challenge. I will take some time to digest this. Thanks again!