Skip to content

Instantly share code, notes, and snippets.

@dmitrinesterenko
Last active June 8, 2016 00:14
Show Gist options
  • Select an option

  • Save dmitrinesterenko/bd33b29490d4a49397629e50ddefc837 to your computer and use it in GitHub Desktop.

Select an option

Save dmitrinesterenko/bd33b29490d4a49397629e50ddefc837 to your computer and use it in GitHub Desktop.
Traffic anomaly detection.
Anonymized & aggregated mobile cell data is used for traffic anomaly detection. When we say traffic we mean actual car, and foot traffic.
Although it's very similar to network traffic.
Analysis of new construction and how it will impact the existing city.
Test in Dallas, TX.
Establish what is normal, not Gaussian normal, but really normal. Look at a histogram of what had been the pattern.
Tried to develop a model that is robust, efficient, online and unsupervised.
1. Detect surprise for each road.
2. Group surprises into events.
Tried:
1. Seasonality: Fourier, wavelet, ARIMA, STL. Doesn't work if we have more tehan one outlier.
2. Dimension reduction: PCA, ICA. not robust out of the box.
3. Robust measures: MAD (median absolute value). Can not detect repeated anomalies. It won't understand what's normal
and what's normal unless anomalies are continously scrubbed.
Instead what worked is this:
1. Detect Anomaly
Stable principal component pursuit. (Wright et al NIPS 2009, Zhou et al 2010).
Time
SPCP(week1, week2, week3, week4, ...) each week is a vector
Project to low rank representation.
Original time series
T(time series) = L (low rank fit) + A (anomaly) + N (noise)
min rank(L) + a ||A||0 subj ||N||f < e
min ||L||* + b||A||
She said there is code online for this that is mostly coded in Erlang :)
A human looking at a chart will say yes: those peaks are anomalies. Thus it satisfies the requirement
that a human can tell the difference in data.
Their algo "complained" when the traffic was lower on a Friday for two of the Fridays when there was a thunderstorm.
Normalized anomaly = A/L
Set to NA if med volume is small. I.e. when unsure don't do anything.
This was fast, unsupervised. Two hours to process 7 days of data on a mac book pro.
Online: A new = T new - L old - N old
Can control spasity of anomaly: i.e. how anomolous do you need to be to count. How much of a web scraper. OR a strange
traffic pattern.
2. Group into events
Breadth graph expansion.
From any seed append anomalies on adjacent roads up to 5 degrees
Adjacent hours. For web traffic it's "related" pages, cycling through the application.
Big events: sum(|surprise|) > threshold (this seems like it could be better)
Planning on:
visualizing, indexing, .
Studying: impact of big events and root causes of anomalies.
zzhou@research.att.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment