Skip to content

Instantly share code, notes, and snippets.

View jcantrill's full-sized avatar

Jeff Cantrill jcantrill

View GitHub Profile
# Plan: Add Optional Label Provider to FileLineTooBigError
## Context
When the kubernetes_logs source encounters log lines that exceed `max_line_bytes` or `max_merged_line_bytes`, it emits a `component_errors_total` metric. Currently, this metric only includes generic labels (`error_code`, `error_type`, `stage`, and component labels added automatically via tracing context).
The problem is that when troubleshooting issues with oversized log lines, operators cannot identify **which pod, namespace, or container** is generating the problematic logs without correlating timestamps with verbose error logs.
The Kubernetes log file path already contains all the needed metadata in its structure:
```
@jcantrill
jcantrill / gist:99d913f0def1b87719b402078e258a4c
Created April 28, 2023 18:01
fluentd positionfile.conf
<system>
log_level info
</system>
<source>
@type tail
path /loopfs/in/*.log
pos_file /loopfs/in/my.pos
<parse>
@type csv
require 'file-tail'
source_dir = ARGV.length > 0 ? ARGV[0] : '/tmp/loopfs/test'
no_of_sources = ARGV.length > 1 ? ARGV[1].to_i : 1
msg_size = ARGV.length > 2 ? ARGV[2].to_i : 1
pos_file = "#{source_dir}/my.pos"
running = true
unwatched = "".rjust(16,'f')
++ oc -n openshift-logging exec -c elasticsearch elasticsearch-cdm-4tng2d4i-1-b9b8878d7-mchvt -- curl -ks '"https://elasticsearch-metrics.openshift-logging.svc:60000/_prometheus/metrics"' '-H"Authorization:' Bearer 'eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbG9nZ2luZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJ1bmF1dGhvcml6ZWQtc2EtMTI2OTEtdG9rZW4ta2h2Z20iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoidW5hdXRob3JpemVkLXNhLTEyNjkxIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMjU5MWI0OTgtM2M1OS0xMWVhLWIxNDMtMDJmNzEyYTc2YTc2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1sb2dnaW5nOnVuYXV0aG9yaXplZC1zYS0xMjY5MSJ9.L6vjQdB-CTaK2bOXVeXl-6ObRBa5BqTGJytB_BSeMqqEXteA8RHkCq0ke4wj57j3jTtAFbTHKMpfT1oELlSuVE7Agz8XWg2TlGKbXfyxYvUxqs1GhCX5yyqskjLy4D8Iz2eLMAaY7gQZNne-9pAegQTA_iS36rHeQxHOJgOvmjiBSfANNk43jhsamRzoVsmubT9_xMlCDAXN-_qqfIZPFucM0Qn7pHr_CH
CN=system.logging.rsyslog,OU=OpenShift,O=Logging
|- indices:
|-*/*
|-CRUD
|-CREATE_INDEX
|-CRUD
|--- cluster:
|-CREATE_INDEX
|-CRUD
|-cluster:monitor/*
@jcantrill
jcantrill / images
Created September 5, 2018 18:17
get the images
#!/bin/bash
pod=$1
echo "DCs"
echo "----"
oc get dc -n logging -o yaml | grep image: | sort | uniq
echo "DSs"
echo "----"
oc get ds -n logging -o yaml | grep image: | sort | uniq
@jcantrill
jcantrill / gist:eca85af2057b84642510cc086f1e5b97
Created August 28, 2018 13:22
Standing up Openshift using 'oc cluster up' and ansible
Overview
At the time of writing this document, 'oc cluster up --logging' or its 3.11 equivalent is broken. Following are instructions on using 'oc cluster up' followed by ansible to install logging. These instructions are generally valid for any Openshift release from 3.5 to 3.11.
Environment
These instructions are based on using:
Host: Centos 7 on libvirt
Mem: 8G
@jcantrill
jcantrill / fluent-logs
Created July 25, 2018 13:08
Get the logs of the fluent pods
#!/bin/bash
pod=${1:-}
if [ -z "${pod}" ]; then
pod=$(oc get pods -l component=fluentd -o jsonpath={.items[*].metadata.name})
fi
for p in ${pod}; do
echo ">>>>>>>><<<<<<<<<<<<<"
echo " ${p}"
echo ">>>>>>>><<<<<<<<<<<<<"
oc logs $p
@jcantrill
jcantrill / delete-index-patterns
Last active August 27, 2018 17:00
This script finds the old index patterns that match the 'project.*.*.*.*' and removes them from the .kibana index
#!/bin/bash -e
POD=$1
SIZE="${SIZE:-1000}"
index=".kibana"
oc exec -n logging -c elasticsearch $POD -- es_util --query="$index/index-pattern/_search?pretty&stored_fields=_id&size=$SIZE" | grep id | grep project\..* | cut -d ':' -f 2 | cut -d '"' -f 2 | paste -sd " " > patterns
echo '' > payload
for p in $(cat patterns); do
echo "{\"delete\":{\"_index\":\"${index}\", \"_type\":\"index-pattern\", \"_id\":\"$p\"}}"i # >> payload
done
cat payload
@jcantrill
jcantrill / check-fluent-connectivity-to-es
Last active July 25, 2018 13:00
This script checks the ability of the fluent pods to connect to Elasticsearch
#!/bin/sh
pods=${1:-"--all"}
shift
if [ "${pods}" == "--all" ]; then
pods=$(oc get pods -l component=fluentd -o jsonpath={.items[*].metadata.name})
fi
for p in $pods; do
output=$(oc exec $p -- curl --silent -q https://logging-es:9200/ --key /etc/fluent/keys/key --cacert /etc/fluent/keys/ca --cert /etc/fluent/keys/cert "$@")