Atul Kumar atulkumar2

5 followers · 5 following

Freelance
Bangalore

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

atulkumar2 / terminologies

Created August 2, 2014 10:07 — forked from karimkhanp/terminologies

	-------------------------------------------------------- Edit to Enlarge ----------------------------------------------


	Apache spark - Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley.[1] Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS).[2] However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce for certain applications.

	Database pipelining - http://www.tuplejump.com/img/ff08.theplatform.png
	As you will notice it's just not about processing the data, but involves a lot of other components. Collection, storage, exploration, ML and visualization are critical to the proect's success.


	SOLR - Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.

atulkumar2 / bigdata_resource

Last active August 29, 2015 14:06 — forked from karimkhanp/bigdata_resource

	Bigdata is like combination of bunch of subjects. Mainly require programming, analysis, nlp, MLP, mathematics.

	To see links, Go : http://www.quora.com/What-are-some-good-sources-to-learn-big-data
	Here are bunch of courses I came accross:

	Introduction to CS Course
	Notes: Introduction to Computer Science Course that provides instructions on coding.
	Online Resources:
	Udacity - intro to CS course,
	Coursera - Computer Science 101