At Intercom, we have hundreds (nearly a thousand) of SQS dead letter queues (DLQs) with various paging priority levels. The challenge we were trying to solve was filtering and verifying which of these queues actually need to be paging our on-call engineers, particularly for lower priority issues.
As the dataset gets bigger, it becomes very difficult to maintain the right signal-to-noise ratio, but this has a real negative impact on those engineers when they're woken up in the middle of the night for something that just isn't really that important. The manual review process was becoming unsustainable - engineers would need to gather data from multiple sources (Terraform infrastructure, Honeycomb observability datasets, production metrics), analyze each queue's health and business impact, make decisions about appropriate paging tiers, and then implement approved changes across infrastructure.
This is exactly the kind of