Skip to content

Instantly share code, notes, and snippets.

@KoopaKing
Created November 26, 2019 02:04
Show Gist options
  • Save KoopaKing/5afd9e8d94a37464d65bbdc149c36a86 to your computer and use it in GitHub Desktop.
Save KoopaKing/5afd9e8d94a37464d65bbdc149c36a86 to your computer and use it in GitHub Desktop.
Dataproc -- Disable Noisy Timeline Server Logs

Disable Noisy Timeline Server Logs

Overview of Issue

Dataproc images in the 1.3 and 1.4 minor version tracks are affected by an issue that causes the YARN Timeline Server to log Exceptions when the YARN Resource Manager attempts to use the POST Timeline Entities REST API.

This excessive logging is made worse by a lack of log rotation for these logs, caused by an older version of Jersey utilizing the Java util logging (JUL) system, which does not respect the log4j properties respected by the rest of the system. This is similar to this JIRA issue for the HDFS NameNode.

Between the extremely noisy logging and the lack of rotation, this can cause larger issues for longer running Dataproc Clusters where the Master node's disk fills up, causing HDFS to enter Safemode, which can impact running workloads.

Fixing the Issue

The provided script can be used as an initialization action or run on live clusters. Do note that there is always some risk associated with restarting hadoop services on a running cluster, but this has been tested to be generally safe for use on running Master nodes that may be effected.

Using this Initialization Action

You can use this initialization action to create a new Dataproc Cluster that is not impacted by this issue:

  1. Stage this initialization action in a GCS bucket.

    git clone https://github.com/GoogleCloudPlatform/dataproc-initialization-actions.git
    gsutil cp \
      dataproc-initialization-actions/hotfix/disable-noisy-timeline-server-logs/disable-noisy-timeline-server-logs.sh \
      gs://<YOUR_GCS_BUCKET>/disable-noisy-timeline-server-logs.sh
  2. Create a cluster using the staged initialization action.

    gcloud dataproc clusters create <CLUSTER_NAME> \
      --initialization-actions=gs://<YOUR_GCS_BUCKET>/disable-noisy-timeline-server-logs.sh
#!/bin/bash
set -euxo pipefail
ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
NAME=$(hostname)
if [[ "${ROLE}" == 'Master' ]]; then
CUSTOM_LOGGING_PROPS_FILE=/etc/hadoop/conf/disable-noisy-timeline-logger.logging.properties
cat << EOF > "${CUSTOM_LOGGING_PROPS_FILE}"
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.level=OFF
EOF
cat << EOF >> /etc/hadoop/conf/yarn-env.sh
export YARN_TIMELINESERVER_OPTS="\${YARN_TIMELINESERVER_OPTS} -Djava.util.logging.config.file=${CUSTOM_LOGGING_PROPS_FILE}"
EOF
systemctl restart hadoop-yarn-timelineserver
systemctl status --no-pager hadoop-yarn-timelineserver
rm /var/log/hadoop-yarn/yarn-yarn-timelineserver-${NAME}.out.*
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment