dmitrinesterenko · June 8, 2016 00:14
diff --git a/Anomaly detection b/Anomaly detection
 Anonymized & aggregated mobile cell data is used for traffic anomaly detection. When we say traffic we mean actual car, and foot traffic.
 Although it's very similar to network traffic.

 Analysis of new construction and how it will impact the existing city.

 Test in Dallas, TX.

 Establish what is normal, not Gaussian normal, but really normal. Look at a histogram of what had been the pattern.

 Tried to develop a model that is robust, efficient, online and unsupervised.

 1. Detect surprise for each road.
 2. Group surprises into events.

 Tried: 
 1. Seasonality: Fourier, wavelet, ARIMA, STL. Doesn't work if we have more tehan one outlier.
 2. Dimension reduction: PCA, ICA. not robust out of the box.
 3. Robust measures: MAD (median absolute value). Can not detect repeated anomalies. It won't understand what's normal
 and what's normal unless anomalies are continously scrubbed.

 Instead what worked is this:

 1. Detect Anomaly 
 Stable principal component pursuit. (Wright et al NIPS 2009, Zhou et al 2010).

 Time 

 SPCP(week1, week2, week3, week4, ...) each week is a vector 

 Project to low rank representation.

 Original time series 
 T(time series) = L (low rank fit) + A (anomaly) + N (noise)

 min rank(L) + a ||A||0 subj ||N||f < e
 min ||L||* + b||A|| 

 She said there is code online for this that is mostly coded in Erlang :)

 A human looking at a chart will say yes: those peaks are anomalies. Thus it satisfies the requirement 
 that a human can tell the difference in data.

 Their algo "complained" when the traffic was lower on a Friday for two of the Fridays when there was a thunderstorm.

 Normalized anomaly = A/L
 Set to NA if med volume is small. I.e. when unsure don't do anything.

 This was fast, unsupervised. Two hours to process 7 days of data on a mac book pro.
 Online: A new = T new - L old - N old
 Can control spasity of anomaly: i.e. how anomolous do you need to be to count. How much of a web scraper. OR a strange
 traffic pattern.

 2. Group into events

 Breadth graph expansion.

 From any seed append anomalies on adjacent roads up to 5 degrees
 Adjacent hours. For web traffic it's "related" pages, cycling through the application. 

 Big events: sum(|surprise|) > threshold (this seems like it could be better)

 Planning on:
 visualizing, indexing, . 

 Studying: impact of big events and root causes of anomalies.

 zzhou@research.att.com
	Anonymized & aggregated mobile cell data is used for traffic anomaly detection. When we say traffic we mean actual car, and foot traffic.
	Although it's very similar to network traffic.

	Analysis of new construction and how it will impact the existing city.

	Test in Dallas, TX.

	Establish what is normal, not Gaussian normal, but really normal. Look at a histogram of what had been the pattern.

	Tried to develop a model that is robust, efficient, online and unsupervised.

	1. Detect surprise for each road.
	2. Group surprises into events.

	Tried:
	1. Seasonality: Fourier, wavelet, ARIMA, STL. Doesn't work if we have more tehan one outlier.
	2. Dimension reduction: PCA, ICA. not robust out of the box.
	3. Robust measures: MAD (median absolute value). Can not detect repeated anomalies. It won't understand what's normal
	and what's normal unless anomalies are continously scrubbed.

	Instead what worked is this:

	1. Detect Anomaly
	Stable principal component pursuit. (Wright et al NIPS 2009, Zhou et al 2010).

	Time

	SPCP(week1, week2, week3, week4, ...) each week is a vector

	Project to low rank representation.

	Original time series
	T(time series) = L (low rank fit) + A (anomaly) + N (noise)

	min rank(L) + a \|\|A\|\|0 subj \|\|N\|\|f < e
	min \|\|L\|\|* + b\|\|A\|\|

	She said there is code online for this that is mostly coded in Erlang :)

	A human looking at a chart will say yes: those peaks are anomalies. Thus it satisfies the requirement
	that a human can tell the difference in data.

	Their algo "complained" when the traffic was lower on a Friday for two of the Fridays when there was a thunderstorm.

	Normalized anomaly = A/L
	Set to NA if med volume is small. I.e. when unsure don't do anything.

	This was fast, unsupervised. Two hours to process 7 days of data on a mac book pro.
	Online: A new = T new - L old - N old
	Can control spasity of anomaly: i.e. how anomolous do you need to be to count. How much of a web scraper. OR a strange
	traffic pattern.

	2. Group into events

	Breadth graph expansion.

	From any seed append anomalies on adjacent roads up to 5 degrees
	Adjacent hours. For web traffic it's "related" pages, cycling through the application.

	Big events: sum(\|surprise\|) > threshold (this seems like it could be better)

	Planning on:
	visualizing, indexing, .

	Studying: impact of big events and root causes of anomalies.

	zzhou@research.att.com
No results found