Skip to content

Instantly share code, notes, and snippets.

@dapangmao
Last active August 29, 2015 14:10
Show Gist options
  • Save dapangmao/f260cc7a852807381064 to your computer and use it in GitHub Desktop.
Save dapangmao/f260cc7a852807381064 to your computer and use it in GitHub Desktop.

#Mining Massive Datasets ##Week1 ###Distributed File Systems

  • Node failures
    A single server can stay up for 3 years (1000 days)
    1000 servers in cluster => 1 failure/day
    1M servers in cluster => 1000 failures/day

  • MapReduce addresses the challenges of cluster Store data redundantly Move computaRon close to data Simple programming model

###What are covered by this course?

  • High dimensional data
    • locality sensitive hashing
    • clustering
    • dimensional reduction
  • Graph data
    • PageRank, SimRank
    • Comunity Detection
    • Spam detection
  • Infinite data
    • Filter Data Stream
    • Web advertising
    • Queries on streams
  • Machine learning
    • SVM
    • Decision Tress
    • Perceptron, kNN
  • Apps
    • Recommender systems
    • Association Rules
    • Duplicate document detection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment