Last active
October 13, 2016 12:46
-
-
Save mavencode01/4f3cd0a99ce5b1d6f493925d807d5488 to your computer and use it in GitHub Desktop.
Spark tips
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Issue with Spark scratch space growing too much and running out of disk space eventually lead to failed job | |
Solution: | |
Removing "org.apache.spark.serializer.KryoSerializer" seems to solve the problem | |
//.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") | |
## Your executor needs memory, | |
One of the reason you get OOM exception is because the partition data your executor needs to process is | |
more than what you have provided. | |
So launching more executors may not solve the problem. | |
You've got to make sure the executor has enough memory also. | |
Typically, I reduce the number of my executors so I can increase the memory per executor. | |
Also, if you broacasting a large variable to all your executors, you got to take that | |
into consideration the memory you have given each executor. | |
2. Persisting parent RDDs causes the RDD to catch twice and UI shows caching over 100% | |
Solution: | |
3. Spark connection reset errors | |
Solution: | |
https://issues.apache.org/jira/browse/SPARK-5085 | |
sudo ethtool -K eth0 tso off | |
sudo ethtool -K eth0 sg off |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment