Last active
December 15, 2015 20:19
-
-
Save adrianp/5318138 to your computer and use it in GitHub Desktop.
Lean list of various BigData/NoSQL related projects
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This work-in-progress summarizes the way-too-many BigData(tm) technologies. | |
This is by no means an in-depth description, but a very short summary so that | |
I know where to look. | |
1. Databases: | |
* DynamoDB - aws.amazon.com/dynamodb/ - Amazon AWS integration, MapReduce | |
* MongoDB - mongodb.org/ - JSON-style document database, SQL-like queries + MapReduce | |
* Riak - basho.com/riak/ - Key-Value storage, MapReduce | |
* CouchDB - couchdb.apache.org/ - JSON document storage, JavaScript Queries + MapReduce | |
* Redis - redis.io/ - Key-Value storage, Pub/Sub messaging | |
* HBase - hbase.apache.org/ - Bigtable-like capabilities on top of Hadoop and HDFS | |
* Cassandra - cassandra.apache.org/ - BigTable-like, SQL-like queries + MapReduce | |
* Hypertable - hypertable.org/ - Bigtable-like, SQL-like queries + MapReduce, strong commercial support | |
* Accumulo - accumulo.apache.org/ - Key-Value storage, Bigtable+Hadoop+HDFS | |
* Neo4j - neo4j.org/ - Graph database | |
* Couchbase - couchbase.com/ - Document-oriented, querying + MapReduce | |
* VoltDB - voltdb.com/ - OLTP/real-time processing database by Stonebraker, proprietary | |
* scalaris - code.google.com/p/scalaris/ - Key-Value storage | |
* Voldemort - project-voldemort.com/ - Key-Value storage, used at LinkedIn | |
* MemcacheDB - memcachedb.org/ - Key-Value storage based on Memcached | |
* VelocityDB - velocitydb.com/ - Object and Graph DB, Key-Value support | |
* ElephantDB - github.com/nathanmarz/elephantdb/ - Database specialized on exporting key-valuedata from Hadoop | |
Questions: Why does Apache have so many identical projects? | |
2. Data analysis: | |
* elasticsearch - elasticsearch.org/ - Distributed RESTful search and analytics on top of Lucene, Memchaced, JSON | |
* Hadoop + HDFS - hadoop.apache.org/ - MapReduce implementation | |
* Hive - hive.apache.org/ - Data warehouse over Hadoop | |
* Mahoot - mahout.apache.org/ - Scalable ML | |
* Pig - pig.apache.org/ - Uses Pig Latin to produce sequences of MapReduce jobs (for Hadoop) | |
* D3.js - d3js.org/ - JavaScript library for visualizing data | |
* R - r-project.org/ - Statistics | |
* Julia - julialang.org/ - Potential replacement for R | |
* Drill - incubator.apache.org/drill/ - Big data analysis based on Google Dremel | |
* Gremlin - github.com/tinkerpop/gremlin/ - Graph analysis | |
* Giraph - giraph.apache.org/ - Graph analysis | |
* InfiniteGraph - objectivity.com/infinitegraph/ - Graph analysis, commercial | |
* Golden Orb - goldenorbos.org/ - Graph analysis using Google Pregel on top of Hadoop | |
* JethroData - jethrodata.com/ - Data analysis on top of Hadoop, commercial | |
* Spark - spark-project.org/- Projects that aims to extend/improve Hadoop, move beyond MapReduce | |
* HStreaming - hstreaming.com/ - Real time and batch processing workflow over Hadoop and HDFS, commercial | |
3. Real time processing: | |
* DBToaster - dbtoaster.org/ - Creates processing engines from SQL queries | |
* Storm - storm-project.net/ - MapReduce over real time data | |
* Trident - engineering.twitter.com/2012/08/trident-high-level-abstraction-for.html/ - Elegant abstraction for defining Storm topologies | |
* Squall - github.com/epfldata/squall/ - SQL over Storm | |
* SAP Hana - http://www.sap.com/solutions/technology/in-memory-computing-platform/hana/overview/index.epx/ - In-memory DB and stream processing, commercial | |
* Esper - esper.codehaus.org/ - CEP, Java and .NET, commercial | |
4. Infrastructure | |
* ZooKeeper - zookeeper.apache.org/ - Distributed coordination | |
* ZeroMQ - zeromq.org/ - Message transport layer | |
* RabbitMQ - rabbitmq.com/ - Message transport layer | |
* Kafka - kafka.apache.org/ - Publish/Subscribe messaging system | |
* S4 - incubator.apache.org/s4/ - Real time processing infrastructure | |
* Kestrel - github.com/robey/kestrel/ - Message transport layer | |
* Ganglia - ganglia.sourceforge.net/ - Monitoring | |
* OpenStack - openstack.org/ - Open source software for building clouds | |
* Cloud Foundry - cloudfoundry.com/ - Deployment solution | |
5. Resources: | |
* Database comparison - http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis/ | |
* More comprehensive NoSQL list - http://nosql-database.org/ | |
* Big Data Right Now: Five Trendy Open Source Technologies (10.2012) - http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/?goback=%2Egde_4332669_member_225815227/ | |
* SQL is what’s next for Hadoop: Here’s who’s doing it (01.2013) - http://gigaom.com/2013/02/21/sql-is-whats-next-for-hadoop-heres-whos-doing-it/ | |
* Wikipedia, ofc: http://en.wikipedia.org/wiki/NoSQL | |
* Nathan Marz (Storm developer) on beating the CAP theorem (as this is controversial, make sure to read the comments also): http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment