##Building and using Spark Notebook for MapR
The spark-notebook is a useful browser-based REPL that can be used to explore data and build visualizations. This guide will illustrate the MapR-specific requirements for building, and using the spark notebook on MapR clusters.
###Building Checkout the source:
$ git clone https://github.com/andypetrella/spark-notebook.git
Follow instructions here for building on MapR: https://github.com/andypetrella/spark-notebook#building-for-mapr
Add this line to ~/.sbt/repositories
:
mapr: http://repository.mapr.com/maven
For MapR 5.0, the build command which includes hive and parquet support is:
$ sbt -Dspark.version=1.4.1 -Dhadoop.version=2.7.0-mapr-1506 -Dwith.hive=true -Dwith.parquet=true clean dist
###Running The following environment variables need to be set to run on MapR 5.0:
$ export HADOOP_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop
$ export EXTRA_CLASSPATH=/opt/mapr/lib/commons-configuration-1.6.jar:/opt/mapr/lib/hadoop-auth-2.7.0.jar:/opt/mapr/lib/maprfs-5.0.0-mapr.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/zookeeper-3.4.5-mapr-1503.jar
Copy the spark-assembly
jar into the MapR filesystem:
cp /opt/mapr/spark/spark-1.4.1/lib/spark-assembly-1.4.1-hadoop2.5.1-mapr-1501.jar /mapr/<clustername>/apps/spark/spark-assembly.jar
Start the server:
$ ./bin/spark-notebook
Ensure that the following sparkConfig is set in whatever notebook you're running against the cluster (in the notebook Edit
menu select Edit Notebook Metadata
):
"customSparkConf": {
"spark.app.name": "Notebook",
"spark.master": "yarn-client",
"spark.executor.memory": "1G",
"spark.yarn.jar": "maprfs:///apps/spark/spark-assembly.jar"
},
I guess adding
http://repository.mapr.com/maven
to the default repos would be interesting!