kanekv · June 20, 2018 05:17
diff --git a/spark config b/spark config
 My initial configuration is: 
    conf.set("spark.cores.max", "16") // 16 map workers, that is 2 workers per machine (see my cluster config below) 
    conf.set("spark.akka.frameSize", "100000") 
    conf.set("spark.executor.memory", "120g") 
    conf.set("spark.reducer.maxMbInFlight", "100000") 
    conf.set("spark.storage.memoryFraction", "0.9") 
    conf.set("spark.shuffle.file.buffer.kb", "1000") 
    conf.set("spark.broadcast.factory", "org.apache.spark.broadcast.HttpBroadcastFactory")   
    conf.set("spark.driver.maxResultSize", "120g") 
    val sc = new SparkContext(conf) 

 I am running this on a cluster with 8 machines, each machine has 16 cores and 130 GB RAM. 

 My spark-env.sh contains: 
 ulimit -n 200000 
 SPARK_JAVA_OPTS="-Xms120G -Xmx120G -XX:-UseGCOverheadLimit -XX:-UseCompressedOops" 
 SPARK_DRIVER_MEMORY=120G
	My initial configuration is:
	conf.set("spark.cores.max", "16") // 16 map workers, that is 2 workers per machine (see my cluster config below)
	conf.set("spark.akka.frameSize", "100000")
	conf.set("spark.executor.memory", "120g")
	conf.set("spark.reducer.maxMbInFlight", "100000")
	conf.set("spark.storage.memoryFraction", "0.9")
	conf.set("spark.shuffle.file.buffer.kb", "1000")
	conf.set("spark.broadcast.factory", "org.apache.spark.broadcast.HttpBroadcastFactory")
	conf.set("spark.driver.maxResultSize", "120g")
	val sc = new SparkContext(conf)

	I am running this on a cluster with 8 machines, each machine has 16 cores and 130 GB RAM.

	My spark-env.sh contains:
	ulimit -n 200000
	SPARK_JAVA_OPTS="-Xms120G -Xmx120G -XX:-UseGCOverheadLimit -XX:-UseCompressedOops"
	SPARK_DRIVER_MEMORY=120G