This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-------------------------------------------------------- Edit to Enlarge ---------------------------------------------- | |
Apache spark - Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley.[1] Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS).[2] However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce for certain applications. | |
Database pipelining - http://www.tuplejump.com/img/ff08.theplatform.png | |
As you will notice it's just not about processing the data, but involves a lot of other components. Collection, storage, exploration, ML and visualization are critical to the proect's success. | |
SOLR - Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Bigdata is like combination of bunch of subjects. Mainly require programming, analysis, nlp, MLP, mathematics. | |
To see links, Go : http://www.quora.com/What-are-some-good-sources-to-learn-big-data | |
Here are bunch of courses I came accross: | |
Introduction to CS Course | |
Notes: Introduction to Computer Science Course that provides instructions on coding. | |
Online Resources: | |
Udacity - intro to CS course, | |
Coursera - Computer Science 101 |