zeppelin-with-cloudera.md

apache zeppelin 0.6.1 with cloudera cdh 5.7.0 on ubuntu 14.04 lts

Instalation

resources

required packages

sudo apt-get install node nodejs npm

maven greater or equal to 3.1.0 required https://launchpad.net/~andrei-pozolotin/+archive/ubuntu/maven3

# To install:
sudo apt-get purge maven maven2 maven3
sudo add-apt-repository ppa:andrei-pozolotin/maven3
sudo apt-get update
sudo apt-get install maven3

download source

git clone https://github.com/apache/zeppelin.git

package zeppelin

mvn clean package -Pspark-1.6 -Ppyspark -Dhadoop.version=2.6.0-cdh5.7.0 -Phadoop-2.6 -Pyarn -DskipTests -Pbuild-distr -Pyarn -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo

set up zeppelin configuration use your own local installation folders for cdh and zeppelin

cp /etc/hive/conf/hive-site.xml /home/ubuntu/zeppelin/zeppelin/conf/

follow instructions specified http://blog.cloudera.com/blog/2015/07/how-to-install-apache-zeppelin-on-cdh/

zeppelin-env.sh

custom settings

export MASTER=yarn-client
export ZEPPELIN_JAVA_OPTS="-Dspark.yarn.jar=/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/spark/lib/spark-assembly-1.6.0-cdh5.7.0-hadoop2.6.0-cdh5.7.0.jar"
export SPARK_HOME=/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/spark

export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop
export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME}

if [ -n "$HADOOP_HOME" ]; then
  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
fi

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}

zeppelin-site.xml

<property>
  <name>zeppelin.server.port</name>
  <value>8087</value>
  <description>Server port.</description>
</property>

Interpreter Configuration

create hive interpreter

https://zeppelin.apache.org/docs/0.6.1/interpreter/hive.html

Properties
name	value
common.max_count	1000
default.driver	org.apache.hive.jdbc.HiveDriver
default.password	hdfs
default.url	jdbc:hive2://localhost:10000
default.user	hdfs
hive.driver	org.apache.hive.jdbc.HiveDriver
hive.password	hdfs
hive.url	jdbc:hive2://localhost:10000
hive.user	hdfs
zeppelin.interpreter.localRepo	/home/ubuntu/zeppelin/zeppelin/local-repo/2C1AF1ZBW
zeppelin.jdbc.auth.type	
zeppelin.jdbc.concurrent.max_connection	
zeppelin.jdbc.concurrent.use	
zeppelin.jdbc.keytab.location	
zeppelin.jdbc.principal	

Dependencies
artifact	exclude
org.apache.hive:hive-jdbc:0.14.0	
org.apache.hadoop:hadoop-common:2.6.0

Miscelaneous

notebooks saved on git

feature not supported yet

http://stackoverflow.com/questions/34566079/zeppelin-notebook-storage-in-local-git-repository

https://github.com/apache/zeppelin/blob/b8755ebb25ad793daa6873acc2e00b8d69588188/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo/GitNotebookRepo.java

This impl intended to be simple and straightforward:

does not handle branches
only basic local git file repo, no remote Github push\pull yet

table from spark rdd

println(s"""%table genre \t total""")
counts.collect().foreach{ case (key, value) => println(s"""%table $key \t  $value""") }

path to zeppelin logs

/PATH/TO/ZEPPELIN/logs/*.out

path to zeppelin notebooks

/PATH/TO/ZEPPELIN/notebook

Zeppelin spark context

in Zeppelin, related to the different contexts created by spark mentioned in http://stackoverflow.com/a/37671017/1915447

check the following setting in the spark interpreter

zeppelin.spark.useHiveContext =	false

set the setting to 'false'

tested with Zeppelin 0.6.2

temporary DF tables are not shown with hive/sql interpreter https://community.hortonworks.com/questions/30874/zeppelin-hive-sql-charts-are-not-working-with-temp.html

treper/zeppelin-with-cloudera.md