Skip to content

Instantly share code, notes, and snippets.

View atulkumar2's full-sized avatar

Atul Kumar atulkumar2

  • Freelance
  • Bangalore
View GitHub Profile
-------------------------------------------------------- Edit to Enlarge ----------------------------------------------
Apache spark - Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley.[1] Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS).[2] However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce for certain applications.
Database pipelining - http://www.tuplejump.com/img/ff08.theplatform.png
As you will notice it's just not about processing the data, but involves a lot of other components. Collection, storage, exploration, ML and visualization are critical to the proect's success.
SOLR - Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.
Bigdata is like combination of bunch of subjects. Mainly require programming, analysis, nlp, MLP, mathematics.
To see links, Go : http://www.quora.com/What-are-some-good-sources-to-learn-big-data
Here are bunch of courses I came accross:
Introduction to CS Course
Notes: Introduction to Computer Science Course that provides instructions on coding.
Online Resources:
Udacity - intro to CS course,
Coursera - Computer Science 101