#Mining Massive Datasets ##Week1 ###Distributed File Systems
-
Node failures
A single server can stay up for 3 years (1000 days)
1000 servers in cluster => 1 failure/day
1M servers in cluster => 1000 failures/day -
MapReduce addresses the challenges of cluster Store data redundantly Move computaRon close to data Simple programming model
###What are covered by this course?
- High dimensional data
- locality sensitive hashing
- clustering
- dimensional reduction
- Graph data
- PageRank, SimRank
- Comunity Detection
- Spam detection
- Infinite data
- Filter Data Stream
- Web advertising
- Queries on streams
- Machine learning
- SVM
- Decision Tress
- Perceptron, kNN
- Apps
- Recommender systems
- Association Rules
- Duplicate document detection