qrtt1/Hadoop.Quick.Note.md

Last active August 29, 2015 14:04

Star (2) You must be signed in to star a gist
Fork (1) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/qrtt1/5ebab88e743576e64fe3.js"></script>
Save qrtt1/5ebab88e743576e64fe3 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

Hadoop.Quick.Note.md

自我學習，以目前限有材料規劃的 learning path

這只是工具的用法，別忘了吸收資料科學的知識 yo

hadoop 入門速記

用 Hadoop MapReduce Framework 寫程式出來跑得動，跟你想的結果一樣
參考 popcorny 分享的 word count 專案 https://github.com/popcornylu/hadoop-wordcount
Map Reduce 演算法概念
弄懂 Map Reduce 演算法各資料處理階段對應至 Hadoop MapReduce Framework 上的 Job Template (之後最佳化會用到)
Hadoop 整組怎麼架。Single Mode 架來自己玩，Cluster Mode 架來體驗 (maintain 不一定要學，因為有財力架的公司不太多)

hadoop 在門邊的階段

Hadoop 專案核心 component: HDFS, YARN 內的各 component 功能、用途
各子專案設定檔的共通格式，學習查詢設定方法與實驗參數的影響
MapReduce Framework API 的使用，著重在由外部吃 Configuration 與 FileSystem 操作

hadoop 在門裡之後

Hadoop 各子專案的功用 (HBase, Hive, ...)
學習、體驗各種 hadoop distribution 安裝與使用
http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support
Cloudera, Hortonworks, MapR 看起來是比較有前途一點的
API: 配合 3rd-party JARs 的使用方法
MapReduce Algorithm Design
http://lintool.github.io/MapReduceAlgorithms/

big data service

各家雲端廠商的 Hadoop Service，執行方式略有不同。由於大部分情況不會自己架設，學習一下現有的服務是必要的。

hadoop 之外

仍有許多與 Hadoop MapReduce Framework 競爭的專案，例如 Impala, Spark 或 Storm 等，建構在 HDFS 之上取代 Hadoop MapReduce 的位階試著提供更有效率的 Map Reduce 引擎，可以多看看不同的方式。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment