Building and using Spark Notebook for MapR

##Building and using Spark Notebook for MapR

The spark-notebook is a useful browser-based REPL that can be used to explore data and build visualizations. This guide will illustrate the MapR-specific requirements for building, and using the spark notebook on MapR clusters.

###Building Checkout the source:

$ git clone https://github.com/andypetrella/spark-notebook.git

Follow instructions here for building on MapR: https://github.com/andypetrella/spark-notebook#building-for-mapr

Add this line to ~/.sbt/repositories:

 mapr: http://repository.mapr.com/maven

For MapR 5.0, the build command which includes hive and parquet support is:

$ sbt -Dspark.version=1.4.1 -Dhadoop.version=2.7.0-mapr-1506 -Dwith.hive=true -Dwith.parquet=true clean dist

###Running The following environment variables need to be set to run on MapR 5.0:

$ export HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop
$ export EXTRA_CLASSPATH=/opt/mapr/lib/commons-configuration-1.6.jar:/opt/mapr/lib/hadoop-auth-2.7.0.jar:/opt/mapr/lib/maprfs-5.0.0-mapr.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/zookeeper-3.4.5-mapr-1503.jar

Copy the spark-assembly jar into the MapR filesystem:

cp /opt/mapr/spark/spark-1.4.1/lib/spark-assembly-1.4.1-hadoop2.5.1-mapr-1501.jar /mapr/<clustername>/apps/spark/spark-assembly.jar

Start the server:

$ ./bin/spark-notebook

Ensure that the following sparkConfig is set in whatever notebook you're running against the cluster (in the notebook Edit menu select Edit Notebook Metadata):

"customSparkConf": {
    "spark.app.name": "Notebook",
    "spark.master": "yarn-client",
    "spark.executor.memory": "1G",
    "spark.yarn.jar": "maprfs:///apps/spark/spark-assembly.jar"
  },

cjmatta/spark-notebook_mapr.md

andypetrella commented Dec 18, 2015

Uh oh!