There are many instances, where Spark event log size grow very high, especially in the case of streaming jobs and it is difficult to transfer such a big file to another small cluster for offline analysis. Following shell script will help you to reduce the spark event log size by excluding old jobs from the event log file, so that you still can analyze issues with recent jobs.
After running this shell script on a Linux/Mac terminal, a trimmed output will be saved in the input folder with an extension _trimmed
and you have to use that file for further analysis.
Usage instructions:
- Copy & paste below code snippet into a file
trimsparkeventlog.sh