This procedure is for Spark running in a stand-alone deployment mode
Please follow those instructions:
-
Clone Zeppelin project from master branch on Github
-
If you use DSE 4.8 (thus Spark 1.4) edit the file
$ZEPPELIN_HOME/spark-dependencies/pom.xml
. Duplicate the Maven profilecassandra-spark-1.3
tocassandra-spark-1.4
and update the spark-cassandra-connector version to 1.4.0 -
Build it with this Maven command
mvn clean package -Pcassandra-spark-1.3 (or 1.4 if using DSE 4.8) -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
. Ensure you have Maven version at least 3.x -
Duplicate the file
$ZEPPELINE_HOME/conf/zeppelin-env.sh.template
to$ZEPPELINE_HOME/conf/zeppelin-env.sh
-
Edit the file
$ZEPPELINE_HOME/conf/zeppelin-env.sh
and addexport MASTER=spark://<spark_DSE_master_IP>:7077
-
Start Zeppelin with
$ZEPPELIN_HOME/bin/zeppelin-daemon.sh start
-
Goto
localhost:8080
to open Zeppelin, go to the Interpreter menu -
Edit Spark interpreter properties to change the property master and set it to
spark://<spark_DSE_master_IP>:7077
. Add also the new property spark.cassandra.connection.host to point to a list of IP addresses of your Cassandra cluster. Save the change and confirm by Yes when the popup asks you to confirm. -
Restart Zeppelin with
$ZEPPELIN_HOME/bin/zeppelin-daemon.sh restart
Now you can use Spark, Cassandra and the Spark Cassandra connector. Do not forget to import the Scala implicits:
import org.apache.spark.SparkContext._
import com.datastax.spark.connector._