Skip to content

Instantly share code, notes, and snippets.

@mavencode01
Last active October 13, 2016 12:46
Show Gist options
  • Save mavencode01/4f3cd0a99ce5b1d6f493925d807d5488 to your computer and use it in GitHub Desktop.
Save mavencode01/4f3cd0a99ce5b1d6f493925d807d5488 to your computer and use it in GitHub Desktop.
Spark tips
1. Issue with Spark scratch space growing too much and running out of disk space eventually lead to failed job
Solution:
Removing "org.apache.spark.serializer.KryoSerializer" seems to solve the problem
//.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
## Your executor needs memory,
One of the reason you get OOM exception is because the partition data your executor needs to process is
more than what you have provided.
So launching more executors may not solve the problem.
You've got to make sure the executor has enough memory also.
Typically, I reduce the number of my executors so I can increase the memory per executor.
Also, if you broacasting a large variable to all your executors, you got to take that
into consideration the memory you have given each executor.
2. Persisting parent RDDs causes the RDD to catch twice and UI shows caching over 100%
Solution:
3. Spark connection reset errors
Solution:
https://issues.apache.org/jira/browse/SPARK-5085
sudo ethtool -K eth0 tso off
sudo ethtool -K eth0 sg off
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment