Skip to content

Instantly share code, notes, and snippets.

@sixeyed
Last active April 21, 2025 09:41
Show Gist options
  • Save sixeyed/f419ef2d96663b5966f6f052c14b769b to your computer and use it in GitHub Desktop.
Save sixeyed/f419ef2d96663b5966f6f052c14b769b to your computer and use it in GitHub Desktop.
Runbook for Fluentd indices filling

Runbook - Elasticsearch Fluentd Indices

Fluentd indices filling in Elasticsearch.

Summary

Centralized log collection from application services goes through Fluentd to Elasticsearch.

Fluentd creates an index for each hour of each day's logs - we do not autmatically clear the indices.

🧰 Large quantities of logs can fill the disks so we need to delete old indices.

Alerts

Three levels of alert trigger, depending on amount of disk used:

Alert Disk Usage Severity
LogIndicesMaxing > 50 Mb SEV-1
LogIndicesFilling > 25 Mb SEV-2
LogIndicesGrowing > 10 Mb SEV-3

Triage

1. Confirm the issue

Open the Elasticsearch dashboard in Grafana:

Check the color-coded graph Fluentd indices size.

This tells you combined index size.

2. Find the oldest indices

Clone the SRE tools repo.

First check Elasticsearch is healthy:

./Get-ElasticStatus.ps1

Sample output:

***
Elasticsearch status: yellow
---
This is OK: True
***

If the status is yellow that is OK for the log indices. It means they are not fully replicated, but we don't need the extra redundancy for this data.

Now list the log indices - this list them oldest first and prints the sizes:

./Get-LogIndices.ps1

Sample output:

***                
Log indices: 4
---
Index: fluentd-2025.04.21-04; size: 10.2mb
---
Index: fluentd-2025.04.21-05; size: 10.1mb
---
Index: fluentd-2025.04.21-06; size: 10.2mb
---
Index: fluentd-2025.04.21-07; size: 10.2mb
---
Log indices total size: 40.7mb
***

Identify the oldest fluentd- indices - we don't need to keep logs for more than 3 days.

At high levels of logging there may be multiple logs for one day, split by hours. In that case we can delete the oldest for the day.

Mitigation

Run the script to delete the oldest indices. Parameters:

  • -Count - number of indices to delete, defaults to 2
  • -Yes - confirm deletion
./Remove-LogIndices.ps1

Sample output:

***
Deleting: 2 oldest indices; with Yes: False
---
DRY RUN: Would delete index: fluentd-2025.04.21-04
---
DRY RUN: Would delete index: fluentd-2025.04.21-05
***

Confirm the dry-run script would delete the indices you want to remove, then run again.

./Remove-LogIndices.ps1 -Yes

Sample output:

***
Deleting: 2 oldest indices; with Yes: True
---
Deleting index: fluentd-2025.04.21-04
{"acknowledged":true}
---
Deleting index: fluentd-2025.04.21-05
{"acknowledged":true}
***

List the indices again and check total sizes:

./Get-LogIndices.ps1

Sample output:

***                
Log indices: 4
---
Index: fluentd-2025.04.21-06; size: 10.2mb
---
Index: fluentd-2025.04.21-07; size: 10.2mb
---
Log indices total size: 20.4mb
***

Verification

Refresh the Grafana dashboard - in a minute or two the index size should shrink again.

The goal is to get the total index size below the SEV-3 threshold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment