- http://blog.cloudera.com/blog/2015/07/how-to-install-apache-zeppelin-on-cdh/
- https://ypg-data.github.io/post/2016/02/running-zeppelin-on-cdh/
sudo apt-get install node nodejs npm
maven greater or equal to 3.1.0 required https://launchpad.net/~andrei-pozolotin/+archive/ubuntu/maven3
# To install:
sudo apt-get purge maven maven2 maven3
sudo add-apt-repository ppa:andrei-pozolotin/maven3
sudo apt-get update
sudo apt-get install maven3
git clone https://github.com/apache/zeppelin.git
mvn clean package -Pspark-1.6 -Ppyspark -Dhadoop.version=2.6.0-cdh5.7.0 -Phadoop-2.6 -Pyarn -DskipTests -Pbuild-distr -Pyarn -Dhadoop.version=2.6.0-cdh5.7.0 -Pvendor-repo
set up zeppelin configuration use your own local installation folders for cdh and zeppelin
cp /etc/hive/conf/hive-site.xml /home/ubuntu/zeppelin/zeppelin/conf/
follow instructions specified http://blog.cloudera.com/blog/2015/07/how-to-install-apache-zeppelin-on-cdh/
custom settings
export MASTER=yarn-client
export ZEPPELIN_JAVA_OPTS="-Dspark.yarn.jar=/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/spark/lib/spark-assembly-1.6.0-cdh5.7.0-hadoop2.6.0-cdh5.7.0.jar"
export SPARK_HOME=/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/spark
export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop
export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME}
if [ -n "$HADOOP_HOME" ]; then
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
fi
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}
default port: 8080
<property>
<name>zeppelin.server.port</name>
<value>8087</value>
<description>Server port.</description>
</property>
https://zeppelin.apache.org/docs/0.6.1/interpreter/hive.html
Properties
name value
common.max_count 1000
default.driver org.apache.hive.jdbc.HiveDriver
default.password hdfs
default.url jdbc:hive2://localhost:10000
default.user hdfs
hive.driver org.apache.hive.jdbc.HiveDriver
hive.password hdfs
hive.url jdbc:hive2://localhost:10000
hive.user hdfs
zeppelin.interpreter.localRepo /home/ubuntu/zeppelin/zeppelin/local-repo/2C1AF1ZBW
zeppelin.jdbc.auth.type
zeppelin.jdbc.concurrent.max_connection
zeppelin.jdbc.concurrent.use
zeppelin.jdbc.keytab.location
zeppelin.jdbc.principal
Dependencies
artifact exclude
org.apache.hive:hive-jdbc:0.14.0
org.apache.hadoop:hadoop-common:2.6.0
feature not supported yet
http://stackoverflow.com/questions/34566079/zeppelin-notebook-storage-in-local-git-repository
This impl intended to be simple and straightforward:
- does not handle branches
- only basic local git file repo, no remote Github push\pull yet
println(s"""%table genre \t total""")
counts.collect().foreach{ case (key, value) => println(s"""%table $key \t $value""") }
/PATH/TO/ZEPPELIN/logs/*.out
/PATH/TO/ZEPPELIN/notebook
in Zeppelin, related to the different contexts created by spark mentioned in http://stackoverflow.com/a/37671017/1915447
check the following setting in the spark interpreter
zeppelin.spark.useHiveContext = false
set the setting to 'false'
tested with Zeppelin 0.6.2
temporary DF tables are not shown with hive/sql interpreter https://community.hortonworks.com/questions/30874/zeppelin-hive-sql-charts-are-not-working-with-temp.html
Zeppelin supports running either Spark SQL or Shell commands in parallel (Scala, Python and R paragraphs are run sequentially).
Open the Notebook In the top-right select Interpreters.
Set zeppelin.spark.concurrentSQL
to true.
Set a value for zeppelin.spark.sql.maxConcurrency
How To - Shell:
In the top-right select Interpreters. Set zeppelin.shell.concurrentCommands to true Up to 5 Shell Commands may be run concurrently.