- Install Homebrew if you don't have it yet
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"The script will explain what changes it will make and prompt you before the installation begins. Once you’ve installed Homebrew, insert the Homebrew directory at the top of your PATH environment variable. You can do this by adding the following line at the bottom of your ~/.bash_profile file
export PATH=/usr/local/bin:/usr/local/sbin:$PATH- Install Python 3:
brew install python3
sudo pip3 install jupyter
# Install Jupyter Nbextensions Configurator
sudo pip3 install jupyter_nbextensions_configurator
# Enabling the extension
jupyter nbextensions_configurator enable --user
# The list of enabled jupyter exetnsions will be in ~/.jupyter/nbconfig/notebook.json.-
Install JAVA8 brew update brew tap caskroom/versions brew cask install java8
-
install Scala and sbt brew install scala sbt wget
wget http://apache.mesi.com.ar/hadoop/common/hadoop-2.7.4/hadoop-2.7.4.tar.gz ~/Downloads/
tar -xvf hadoop-2.7.4.tar.gz
export JAVA_HOME=`/usr/libexec/java_home -v 1.8`Edit core-site.xml file located at $HADOOP_DIR/etc/hadoop/ location:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>- Create a folder for namenode and datanode by using command
mkdir -P $HOME/hadoop2_data/hdfs/namenode
mkdir -P $HOME/hadoop2_data/hdfs/datanodeEdit hdfs-site.xml file located at $HADOOP_DIR/etc/hadoop/ location:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name> <value>/Users/hadoopuser/hadoop2_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name> <value>/Users/hadoopuser//hadoop2_data/hdfs/datanode</value>
</property>
</configuration>cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xmlEdit mapred-site.xml file located at $HADOOP_DIR/etc/hadoop/ location:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>Edit yarn-site.xml file located at $HADOOP_DIR/etc/hadoop/ location:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-service.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>- Run the below command to format the namenode directory:
$HADOOP_HOME/bin/hdfs namenode -format- Start the namenode
$HADOOP_HOME/sbin/start-dfs.shThat is all. Now the namenode can be accessed from http://localhost:50070
wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-hadoop2.7.tgz ~/Downloads
cd && tar -xvf spark-2.2.0-bin-hadoop2.7.tarAdd the following lines to your .bash_profile file
export JAVA_HOME=`/usr/libexec/java_home -v 1.8`
export HADOOP_HOME=$HOME/hadoop-2.7.4
export ZEPPELIN_HOME=/Users/slitayem/zeppelin-0.7.3-bin-netinst
export SPARK_HOME=$HOME/spark-2.2.0-bin-hadoop2.7
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export SBT_HOME=/usr/local/Cellar/sbt/1.0.2
PATH=$JAVA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$SBT_HOME/bin:$ZEPPELIN_HOME/bin:$PATH
export PATH=/Library/Frameworks/Python.framework/Versions/3.6/bin:${PATH}
export PYSPARK_PYTHON=python3
alias snotebook='$SPARK_HOME/bin/pyspark --master local[2]'Notes: The PYSPARK_DRIVER_PYTHON parameter and the PYSPARK_DRIVER_PYTHON_OPTS parameter are used to launch the PySpark shell in Jupyter Notebook. The — master parameter is used for setting the master node address. Here we launch Spark locally on 2 cores for local testing.
source ~/.bash_profileNow you can test spark by running pyspark
# Download Zeppeling with Spark interpreter
wget http://apache.lauf-forum.at/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-netinst.tgz ~/Downloads/
cd && tar xzvf zeppelin-0.7.3-bin-netinst.tgz
cd zeppelin-0.7.3-bin-netinst/conf/
cp zeppelin-env.sh.template zeppelin-env.shEdit zeppelin-env.sh file by adding this line to the very top of the file.
export SPARK_HOME=$HOME/spark-2.0.1-bin-hadoop2.7- Start zeppelin
zeppelin-daemon.sh startZepplin should be running on http://localhost:8080. In order to stop zeppelin run zeppelin-daemon.sh stop
I'm still new and learning Spark.
I followed every steps on how to install spark on mac os from you but when i try to run spark-shell i got
192:~ aryadarsono$ spark-shell /usr/local/Cellar/apache-spark/2.2.0/libexec/bin/spark-shell: line 57: /Users/aryadarsono/spark-2.2.0-bin-hadoop2.7/bin/spark-submit: No such file or directory
and when i run pyspark i got
/usr/local/Cellar/apache-spark/2.2.0/libexec/bin/pyspark: line 24: /Users/aryadarsono/spark-2.2.0-bin-hadoop2.7/bin/load-spark-env.sh: No such file or directory /usr/local/Cellar/apache-spark/2.2.0/libexec/bin/pyspark: line 77: /Users/aryadarsono/spark-2.2.0-bin-hadoop2.7/bin/spark-submit: No such file or directory /usr/local/Cellar/apache-spark/2.2.0/libexec/bin/pyspark: line 77: exec: /Users/aryadarsono/spark-2.2.0-bin-hadoop2.7/bin/spark-submit: cannot execute: No such file or directory
is there anything that i have to revise?
Thank you