Skip to content

Instantly share code, notes, and snippets.

@TomLous
Created March 25, 2019 08:58
Show Gist options
  • Save TomLous/c019717c13148de9419410b8d4245688 to your computer and use it in GitHub Desktop.
Save TomLous/c019717c13148de9419410b8d4245688 to your computer and use it in GitHub Desktop.
Install system
"display_name": "PySpark",
"language": "python",
"argv": [
"[python bin]",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"HADOOP_CONF_DIR": "/etc/hadoop/conf",
"HADOOP_USER_NAME": "[username]",
"HADOOP_CONF_LIB_NATIVE_DIR": "/var/lib/cloudera/parcels/CDH/lib/hadoop/lib/native",
"YARN_CONF_DIR": "/etc/hadoop/conf",
"SPARK_YARN_QUEUE": "queue",
"SPARK_HOME": "/var/lib/cloudera/parcels/SPARK2/lib/spark2/",
"PYTHONPATH": "/usr/local/anaconda-py2/bin/python:/usr/local/anaconda-py2/lib/python2.7/site-packages:/media/home/username/libs/sparkling-water-2.1.20/py/build/dist/h2o_pysparkling_2.1-2.1.20.zip:/var/lib/cloudera/parcels/SPARK2/lib/spark2/python:/var/lib/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.4-src.zip",
"PYTHONSTARTUP": "/var/lib/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": "--py-files /media/home/user/libs/sparkling-water-2.1.20/py/build/dist/h2o_pysparkling_2.1-2.1.20.zip --queue cdata --conf 'spark.driver.extraJavaOptions=-Dhttp.proxyHost=proxy.ams1.cloud.ecg.so -Dhttp.proxyPort=3128 -Dhttps.proxyHost=proxy.ams1.cloud.ecg.so -Dhttps.proxyPort=3128' --packages com.databricks:spark-avro_2.11:3.2.0 --conf spark.dynamicAllocation.enabled=false --conf spark.scheduler.minRegisteredResourcesRatio=1 --conf spark.sql.autoBroadcastJoinThreshold=-1 --master yarn --num-executors 5 --driver-memory 2g --executor-memory 20g --executor-cores 3 pyspark-shell"
}
}
##You need this:
##`--conf 'spark.driver.extraJavaOptions=-Dhttp.proxyHost=proxy.ams1.cloud.ecg.so -Dhttp.proxyPort=3128 -Dhttps.proxyHost=proxy.ams1.cloud.ecg.so -Dhttps.proxyPort=3128'`
##if you add packages or it won't be able to download them from the internet
# Mac Command line utilities
xcode-select --install
# Brew
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
# Git
brew install git
# Restart shell
exec bash
brew tap caskroom/cask
brew tap caskroom/versions
brew cask install java8
brew install bash-completion jq scala sbt tig
brew install apache-spark avro-tools gawk glances grep htop
brew install httpie iftop imagemagick jvmtop maven
brew install node openssl python wget mongodb
brew cask install slack iterm2 intellij-idea caffeine clipy docker
brew cask install bbedit keepassxc jd-gui
brew cask install adobe-acrobat-reader virtualbox
# SBT
mkdir -p ~/.sbt/0.13/plugins
echo 'addSbtPlugin("io.get-coursier" % "sbt-coursier" % "1.0.3")' > ~/.sbt/0.13/plugins/coursier.sbt
echo 'addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.9.2")' > ~/.sbt/0.13/plugins/sbt-dependecy.sbt
echo 'addSbtPlugin("com.timushev.sbt" % "sbt-updates" % "0.3.4")' > ~/.sbt/0.13/plugins/sbt-updates.sbt
cat > ~/.sbt/0.13/coursier.sbt <<-END
import coursier.Keys._
classpathTypes += "maven-plugin"
END
mkdir -p ~/.sbt/1.0/plugins
cp ~/.sbt/0.13/plugins/* ~/.sbt/1.0/plugins
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment