mgosk · May 31, 2016 17:55
diff --git a/gistfile1.txt b/gistfile1.txt
 Run a count of the number of documents in the nasa.eva namespace by running: $ mongo > use nasa > db.eva.count() What is the result?
 375

 Spark guarantees that your processing tasks completes in subseconds
 false

 Spark is a great system for performing updates on individual records efficiently
 false

 Spark is a great system for real-time OLTP workloads
 false

 Level 1
 An HDFS deployment can be composed of multiple nodes, each containing a subset of the total data set.
 true

 Level 2
 What does the R and N stand for in YARN?
 Resource Negotiator

 Level 3
 Filtering and projections are functions you would see in which phase of a MapReduce job?
 Map phase

 Level 4
 True or false: I can process data stored in HDFS and write the results of that processing in MongoDB, or can use MongoDB in place of HDFS
 True

 RDD operations are one of two kinds, actions and transactions
 True

 The set of operations performed on an RDD is known as the RDD's
 lineage

 The spark-shell binary is located in which directory?
 bin

 Which command line parameter is used to identify the spark connector jar?
 --conf

 Based on the following --conf parameters, what namespace will this Spark process read from. Which namespace will it write to?
 Read from profiles.users, write to logs.events

 True or False: Since we need this transformation split crew field into on or more tuples we need to use the RDDs flatmap fuction
 True

 The reduceByKey Method is summing values by astronaut name. What is the unit of that value?
 minutes

 The spark connector's write target can be set in the same three ways as the read configuration. Two are at the prompt (programatically) and in the spark defaults conf file. What is the third?
 command line parameter
	Run a count of the number of documents in the nasa.eva namespace by running: $ mongo > use nasa > db.eva.count() What is the result?
	375

	Spark guarantees that your processing tasks completes in subseconds
	false

	Spark is a great system for performing updates on individual records efficiently
	false

	Spark is a great system for real-time OLTP workloads
	false

	Level 1
	An HDFS deployment can be composed of multiple nodes, each containing a subset of the total data set.
	true

	Level 2
	What does the R and N stand for in YARN?
	Resource Negotiator

	Level 3
	Filtering and projections are functions you would see in which phase of a MapReduce job?
	Map phase

	Level 4
	True or false: I can process data stored in HDFS and write the results of that processing in MongoDB, or can use MongoDB in place of HDFS
	True

	RDD operations are one of two kinds, actions and transactions
	True

	The set of operations performed on an RDD is known as the RDD's
	lineage

	The spark-shell binary is located in which directory?
	bin

	Which command line parameter is used to identify the spark connector jar?
	--conf

	Based on the following --conf parameters, what namespace will this Spark process read from. Which namespace will it write to?
	Read from profiles.users, write to logs.events

	True or False: Since we need this transformation split crew field into on or more tuples we need to use the RDDs flatmap fuction
	True

	The reduceByKey Method is summing values by astronaut name. What is the unit of that value?
	minutes

	The spark connector's write target can be set in the same three ways as the read configuration. Two are at the prompt (programatically) and in the spark defaults conf file. What is the third?
	command line parameter