Using docker:
docker run -it --rm -p 8888:8888 jupyter/all-spark-notebook
From standalone installation:
PYSPARK_DRIVER_PYTHON_OPTS="notebook" bin/pyspark
Using docker:
docker run -it --rm -p 8888:8888 jupyter/all-spark-notebook
From standalone installation:
PYSPARK_DRIVER_PYTHON_OPTS="notebook" bin/pyspark
Install needed jars related with aws/s3 operations in spark jar folder spark/jars
:
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.3
https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk
https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-core
https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3
Run with:
cd path/to/spark
./bin/spark-submit path/to/script
https://issues.apache.org/jira/browse/SPARK-15965
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.3
https://issues.apache.org/jira/browse/LENS-746
https://mvnrepository.com/search?q=aws-java-sdk
http://www.sparktutorials.net/reading-and-writing-s3-data-with-apache-spark
Run Spark Standalone Cluster in a single machine.
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
sudo apt-get install scala
wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz
tar zxvf spark-2.1.0-bin-hadoop2.7.tgz
mv spark-2.1.0-bin-hadoop2.7 spark_2.1.0
export SPARK_HOME=/home/raul/spark_2.1.0
export PATH=$PATH:$SPARK_HOME/bin
source .profile
Create file vim $SPARK_HOME/conf/spark-env.sh
containing:
SPARK_WORKER_CORES=2
SPARK_WORKER_INSTANCES=2
SPARK_WORKER_MEMORY=2g
$SPARK_HOME/sbin/start-master.sh
$SPARK_HOME/sbin/start-slave.sh spark://data-app1-uat.east:7077
$SPARK_HOME/logs/spark-root-org.apache.spark.deploy.master.Master-1-data-app1-uat.east.out
$SPARK_HOME/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-data-app1-uat.east.out
$SPARK_HOME/logs/spark-root-org.apache.spark.deploy.worker.Worker-2-data-app1-uat.east.out
$SPARK_HOME/sbin/stop-master.sh
$SPARK_HOME/sbin/stop-slave.sh
Resources:
https://spark.apache.org/docs/latest/cluster-overview.html
docker run -p 8080:8080 -p 4040:4040 sofianito/zeppelin