Skip to content

Instantly share code, notes, and snippets.

@kanekv
Last active June 20, 2018 05:17
Show Gist options
  • Save kanekv/019770009c2624a27de8 to your computer and use it in GitHub Desktop.
Save kanekv/019770009c2624a27de8 to your computer and use it in GitHub Desktop.
spark config
My initial configuration is:
conf.set("spark.cores.max", "16") // 16 map workers, that is 2 workers per machine (see my cluster config below)
conf.set("spark.akka.frameSize", "100000")
conf.set("spark.executor.memory", "120g")
conf.set("spark.reducer.maxMbInFlight", "100000")
conf.set("spark.storage.memoryFraction", "0.9")
conf.set("spark.shuffle.file.buffer.kb", "1000")
conf.set("spark.broadcast.factory", "org.apache.spark.broadcast.HttpBroadcastFactory")
conf.set("spark.driver.maxResultSize", "120g")
val sc = new SparkContext(conf)
I am running this on a cluster with 8 machines, each machine has 16 cores and 130 GB RAM.
My spark-env.sh contains:
ulimit -n 200000
SPARK_JAVA_OPTS="-Xms120G -Xmx120G -XX:-UseGCOverheadLimit -XX:-UseCompressedOops"
SPARK_DRIVER_MEMORY=120G
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment