By Allie Morgan on Aug 12, 2015

NOT Yet Another Anomaly Detection Package

There’s no lack of tools for anomaly detection on GitHub. We felt that the world needed another that raises the bar.

That’s why we introduced Anomalyzer: Lytics’ very own anomaly detection package.

We’ve got a philosophical difference with most of them out there that lets us do some pretty cool stuff. While most anomaly detection packages return a boolean flag, ours delivers a probability of whether or not what you’re monitoring is anomalous or not.

Is your time series anomalous, or isn’t it?

Should you wake up at 3:00AM to try to fix your cluster for an issue that can definitely wait until daylight?

Our anomaly detection package puts the power in your hands, while saving you time and resources.

Configuration

Anomalyzer is highly configurable. Any time series can be analyzed using the package, and you can specify which of the seven tests we’ve implemented to include for detection.

Each test is well suited for certain types of data. Additionally, you can specify the number of data points you’d like to explore to discover outliers. For example, say you wanted to consider whether or not this year has been an atypically rainy season for Portland.

You could supply yearly, monthly, or daily data for average inches of rainfall and specify either the most recent one, 12, or 365 data points in which you’d like to spot unseasonable weather.

Tests

Speaking of seasons, one of tests we’ve implemented can be used to detect seasonality. Just make sure the number of points you’d like to detect outliers in is equal to a season, and specify how many seasons back we should compare to the current season, and you should be good to go.

For instance, say you have stock in Nike and want to consider how their profits are doing this year as compared to the past five years. Our anomaly detection library can distinguish an uncharacteristic increase in profits from the usual peaks that occur around the holiday season.

Detection

Unlike many other anomaly detection frameworks that use majority vote among a suite of algorithms for detection, Anomalyzer lets each algorithm speak for itself. It dynamically adjusts the weight of each algorithm according to the amount of information it contains.

That means instead of getting TRUE/FALSE for new events, you get an anomalous probability between 0 and 1.

Say you’d like to be notified if your server’s CPU usage acts unusual or approaches some upper bound on its processing power. Just apply our library to your streaming data.

It’s up to you to decide what the threshold will be, but an event described as highly anomalous can be used to trigger a warning to you.

Go Anomalyzer, Go!

You might be wondering whether our project walks the walk. Here are a couple of examples to show you our library in action.

We used our anomaly detection library to find unusual changes in the Bitcoin exchange rate over the past two years. We marked events with anomalous probabilities greater than 80% in red.

One usage of Anomalyzer is to detect significant changes over time, such as anomalous Bitcoin value changes over the years

Note that there are some changes in early 2013 that are flagged as anomalous, although changes toward the end of 2014 are not, even though the magnitude of the changes are larger. This is because the overall behavior later in the series is more erratic, making those changes relatively less anomalous.

Here are anomalies in the rental price per square foot in Portland from 2009 to 2015. Events with anomalous probabilities greater than 90% are shown in red.

Let's look at probable Portland rental prices between 2009 and 2015.

Check out the almost $40-per-square-foot-increase from July to August in 2011.

Anomalyzer is also really good at detecting seasonality. If you configure the active window size to be the length of the season you’re trying to detect, the bootstrap KS test will test for it.

In this example, we show the results of each test on some seasonal data we’ve generated. Our other tests can be a bit oversensitive to the fluctuations in the time series, but the KS test doesn’t over-react.

Try Anomalyzer for Yourself

Check the repository for more examples and a more comprehensive dive into configuration and usage.


By Allie Morgan Data Scientist

Allie Morgan is a physicist-turned-data-scientist at Lytics. Her favorite word is “schadenfreude.” She enjoys biking, vegetable co-ops, dancing, and not inviting her colleagues to her dance performances.