aphlysia/techs.md

Last active July 6, 2016 03:05

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/aphlysia/7423835.js"></script>
Save aphlysia/7423835 to your computer and use it in GitHub Desktop.

Download ZIP

technologies

Raw

machine learning

hivemall https://github.com/myui/hivemall
- hive で machine learning
- http://qiita.com/myui/items/f726ca3dcc48410abe45

datastores

sql on Hadoop

Cloudera's Hive http://hive.apache.org/
- Stinger http://hortonworks.com/labs/stinger/
Cloudera's impala https://github.com/cloudera/impala
Tajo http://tajo.incubator.apache.org/
Facebook's Presto https://github.com/facebook/presto
Pivotal's HAWQ http://www.gopivotal.com/pivotal-products/data/pivotal-hd
- Greenplum DB のエンジンを Hadoop へ移植したもの
Shark http://shark.cs.berkeley.edu/

document oriented

mongodb
couchdb

distributed realtime

Storm http://storm-project.net/

key-value store

Apache Cassandra http://cassandra.apache.org/
- distributed
Redis
kumofs
ROMA

in-memory DBMS

VoltDB
- クエリを事前に Java のコードで書いておく方式。アドホックなクエリが実行できないのを犠牲にして速度を出せる
- 分散する
- ディスクに書き出しもできる
- DB に載せるデータの量だけメモリが必要

others

Amazon Dynamo DB http://aws.amazon.com/jp/dynamodb/
Postgres-XL

cluster computing

Spark http://spark.incubator.apache.org/
- overview わかりやすかった http://spark.incubator.apache.org/talks/overview.pdf
- motivation
  - Hadoop MapReduce だと繰り返し計算がおそい (そのたびに map/reduce するから)。machine learning とか graph では繰り返し計算が多発!
  - interactive にデータ操作したい
- hdfs から読んだデータを cache でもち続けて操作できるようにした
H2O http://0xdata.com/h2o/
apache flink http://flink.incubator.apache.org/
- overview http://www.slideshare.net/stephanewen1/apache-flink-overview
- Introduction to Apache Flink http://www.slideshare.net/robertmetzger1/introduction-to-apache-flink-palo-alto-meetup
hbase とか http://www.slideshare.net/yutuki/cassandrah-baseno-sql http://www.ne.jp/asahi/hishidama/home/tech/apache/hbase/index.html

http://repeatedly.github.io/ja/2014/07/mpp-on-hadoop-redshift-bigquery/

hadoop

storage format
- Parquet http://parquet.io/
  - columnar
version http://metasearch.sourceforge.jp/wiki/index.php?Hadoop%A4%CE%A5%D0%A1%BC%A5%B8%A5%E7%A5%F3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment