Created
December 9, 2024 02:56
-
-
Save huynhbaoan/6d19aa0b0432009e202db1ea6b855339 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Let’s assume you’re monitoring the Route53 DNS query volume metric, which is commonly named DNSQueries, with dimensions like HostedZoneId. Below are detailed examples for each case: | |
1. Tweak Anomaly Detection Parameters | |
Adjust the Confidence Bound | |
Reduce false positives by widening the confidence interval from 99% (default) to 95%. | |
aws cloudwatch put-anomaly-detector \ | |
--namespace "AWS/Route53" \ | |
--metric-name "DNSQueries" \ | |
--dimensions Name=HostedZoneId,Value=Z123456ABCDEFG \ | |
--statistic "Sum" \ | |
--configuration "{\"AnomalyDetectorSettings\":{\"ConfidenceBoundsPercentage\":95}}" | |
• This reduces sensitivity, allowing more variability before triggering an anomaly. | |
2. Combine Metrics or Dimensions | |
Add More Context | |
Instead of monitoring total DNSQueries, monitor it by HostedZoneId to isolate anomalies to specific hosted zones. | |
aws cloudwatch get-metric-data \ | |
--metric-data-queries '[{ | |
"Id": "dnsqueries_zone1", | |
"MetricStat": { | |
"Metric": { | |
"Namespace": "AWS/Route53", | |
"MetricName": "DNSQueries", | |
"Dimensions": [{"Name": "HostedZoneId", "Value": "Z123456ABCDEFG"}] | |
}, | |
"Period": 60, | |
"Stat": "Sum" | |
}, | |
"ReturnData": true | |
}]' | |
• This allows anomaly detection on each hosted zone rather than a global average. | |
Aggregate Metrics Smartly | |
Monitor the total query volume across multiple hosted zones to detect trends affecting your entire setup. | |
{ | |
"Expression": "SUM(METRICS('DNSQueries'))", | |
"Label": "Total DNS Queries Across Hosted Zones", | |
"Id": "total_dnsqueries", | |
"ReturnData": true | |
} | |
3. Set Smarter Alarms | |
Use Multiple Consecutive Data Points | |
Create an alarm that triggers only after three consecutive anomaly detections: | |
aws cloudwatch put-composite-alarm \ | |
--alarm-name "Route53_QueryVolume_Anomaly" \ | |
--alarm-rule "ANOMALY_DETECTION_BAND breached AND datapoints >= 3" \ | |
--actions-enabled \ | |
--alarm-actions arn:aws:sns:us-east-1:123456789012:MySNSTopic | |
Add a Baseline Threshold | |
Combine anomaly detection with a static threshold. For example, trigger an alarm if anomalies occur and queries exceed 1,000,000 in a minute: | |
{ | |
"AlarmRule": "ANOMALY_DETECTION_BAND breached AND MetricName DNSQueries > 1000000" | |
} | |
4. Pre-Process Data | |
Smooth Your Data | |
Create a rolling average for DNSQueries using Metric Math and then apply anomaly detection. | |
{ | |
"Id": "rolling_avg", | |
"Expression": "AVG(METRICS('DNSQueries'), 5)", | |
"Label": "Rolling Avg DNS Queries", | |
"ReturnData": true | |
} | |
Use the rolling_avg as input to the anomaly detector. | |
Example Workflow in the Console | |
1. Go to CloudWatch Console > Metrics > Select DNSQueries. | |
2. Add Metric Math: | |
• Expression: AVG(METRICS('DNSQueries'), 5) | |
• Label: “Smoothed DNS Queries”. | |
3. Create the anomaly detector on this smoothed metric. | |
5. Validate Data Quality | |
Filter Noise from Metrics | |
If certain hosted zones generate unreliable metrics, exclude them or apply anomaly detection only on zones with stable data. | |
aws cloudwatch list-metrics \ | |
--namespace "AWS/Route53" \ | |
--metric-name "DNSQueries" \ | |
--dimensions Name=HostedZoneId,Value=Z123456ABCDEFG | |
Exclude noisy zones by filtering metric data queries or focusing on key zones. | |
6. Alternative Notification Channels | |
Suppression with EventBridge | |
Route anomaly detections to EventBridge and suppress repetitive alerts. | |
EventBridge rule example: | |
{ | |
"Source": ["aws.cloudwatch"], | |
"DetailType": ["CloudWatch Alarm State Change"], | |
"Detail": { | |
"state": ["ALARM"], | |
"alarmName": ["Route53_QueryVolume_Anomaly"] | |
} | |
} | |
Suppress duplicate alerts with a Lambda function that implements cooldown logic. | |
7. Evaluate Alternative Tools | |
Amazon Lookout for Metrics | |
Use Lookout for Metrics for better handling of complex anomaly patterns in Route53 query volume data. | |
Steps: | |
1. Ingest Route53 query volume metrics into Lookout for Metrics. | |
2. Set up ML-based detectors with seasonal trends (e.g., high traffic during the day). | |
3. Apply fine-tuned alerting thresholds using Lookout’s UI. | |
Combine with CloudWatch | |
Use Lookout for Metrics for advanced detection and send the output to CloudWatch for centralized monitoring. | |
Example CloudFormation Template for Automation | |
Here’s an example combining anomaly detection, smoothed data, and smarter alarms: | |
Resources: | |
Route53AnomalyDetector: | |
Type: AWS::CloudWatch::AnomalyDetector | |
Properties: | |
MetricName: "DNSQueries" | |
Namespace: "AWS/Route53" | |
Dimensions: | |
- Name: "HostedZoneId" | |
Value: "Z123456ABCDEFG" | |
Stat: "Sum" | |
Configuration: | |
AnomalyDetectorSettings: | |
ConfidenceBoundsPercentage: 95 | |
Route53Alarm: | |
Type: AWS::CloudWatch::Alarm | |
Properties: | |
AlarmName: "Route53_QueryVolume_Anomaly" | |
ComparisonOperator: GreaterThanOrEqualToThreshold | |
EvaluationPeriods: 3 | |
Threshold: 1000000 | |
Metrics: | |
- Id: "dnsqueries" | |
MetricStat: | |
Metric: | |
MetricName: "DNSQueries" | |
Namespace: "AWS/Route53" | |
Dimensions: | |
- Name: "HostedZoneId" | |
Value: "Z123456ABCDEFG" | |
Period: 60 | |
Stat: "Sum" | |
- Id: "rolling_avg" | |
Expression: "AVG(METRICS('dnsqueries'), 5)" | |
ActionsEnabled: true | |
AlarmActions: | |
- arn:aws:sns:us-east-1:123456789012:MySNSTopic | |
Would you like help implementing any specific example or further explanation? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment