Skip to content

Instantly share code, notes, and snippets.

@philschmid
Last active February 25, 2020 07:22
Show Gist options
  • Select an option

  • Save philschmid/2ea2a2b71b3df3b28c8336d02480e1ca to your computer and use it in GitHub Desktop.

Select an option

Save philschmid/2ea2a2b71b3df3b28c8336d02480e1ca to your computer and use it in GitHub Desktop.
Resources:
LambdaExecutionAnomalyDetector:
Type: AWS::CloudWatch::AnomalyDetector
Properties:
MetricName: Duration
Namespace: AWS/Lambda
Stat: Sum
LambdaExecutionAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmDescription: Lambda invocations
AlarmName: hier-steht-der-name-wie-in-code-build
AlarmActions:
- !Ref "Topic"
ComparisonOperator: LessThanLowerOrGreaterThanUpperThreshold
EvaluationPeriods: 1
Metrics:
- Expression: ANOMALY_DETECTION_BAND(m1, 2)
Id: ad1
- Id: m1
MetricStat:
Metric:
MetricName: Duration
Namespace: AWS/Lambda
Period: !!int 86400
Stat: Sum
ThresholdMetricId: ad1
TreatMissingData: breaching
AlarmTopic:
Type: "AWS::SNS::Topic"
Properties:
DisplayName: "Alert"
Subscription:
-
Endpoint:
!GetAtt "Lambda.Arn"
Protocol: "lambda"
---
# The alert template is intended to be used as a nested template.
# Assuming you want to add an alert for an EC2 instance and given
# resource ${Host} and params ${Stage}, ${Bucket} and ${Key},
# you incorporate it in the parent template as follows:
#
# Resources:
# AlertStack:
# Type: "AWS::CloudFormation::Stack"
# Properties:
# Parameters:
# "Source": !Ref "Host"
# "EC2InstanceId": !Ref "Host"
# "Stage": !Ref "Stage"
# TemplateURL:
# !Sub "https://s3-${AWS::Region}.amazonaws.com/${Bucket}/${Key}"
# TimeoutInMinutes: 10
#
# Note that most fields have workable defaults
AWSTemplateFormatVersion: '2010-09-09'
Parameters:
# optional params - omit any that don't apply
EC2InstanceId:
Description: ID of the monitored host
Type: String
Default: ""
ECSClusterName:
Description: ECS cluster name
Type: String
Default: ""
ECSServiceName:
Description: ECS service name
Type: String
Default: ""
ASGroupName:
Description: Auto scaling group name
Type: String
Default: ""
CWLogGroupName:
Description: CloudWatch log group name
Type: String
Default: ""
# required params
Source:
Description: alarm source's name
Type: String
LogErrorFilter:
Description: search term identifying errors in logs
Type: String
Default: "ERROR"
# Slack integration: stage, channels and URLs
Stage:
Description: Stage name
Type: String
Default: "test"
SlackChannel:
Description: Notification channel
Type: String
Default: alert-template
SlackHookUrlBase64:
Description: Predefined hook URL (base64-encoded)
Type: String
S3Bucket:
Description: Name of S3 Bucket holding alert template and lambda
Type: String
Default: "alert-template"
S3Key:
Description: Lambda key
Type: String
Default: "post-slack-alert-0.1.1.zip"
# shared alarm configuration
AlarmPeriod:
Description: Data evaluation period in seconds
Type: Number
Default: 60
AlarmEvaluationPeriods:
Description: Group size of data evaluation periods
Type: Number
Default: 1
AlarmThreshold:
Description: Limit in percent above which alarms are raised
Type: Number
Default: 80
Conditions:
# only create defined resources
EC2InstanceIdDefined:
!Not [!Equals [!Ref EC2InstanceId, '']]
# only set up cluster alert if no service ID given
ECSClusterDefined:
!And [!Not [!Equals [!Ref ECSClusterName, '']], !Equals [!Ref ECSServiceName, '']]
# only set up service alert if cluster and service IDs given
ECSServiceDefined:
!Not [!Or [!Equals [!Ref ECSServiceName, ''], !Equals [!Ref ECSClusterName, '']]]
ASGroupDefined:
!Not [!Equals [!Ref ASGroupName, '']]
CWLogGroupDefined:
!Not [!Equals [!Ref CWLogGroupName, '']]
Resources:
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Action:
- sts:AssumeRole
Path: "/"
Policies:
- PolicyName: root
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- logs:*
Resource: arn:aws:logs:*:*:*
- PolicyName: s3-read-only
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- s3:GetObject
Resource:
!Sub "arn:aws:s3:::${S3Bucket}/*"
Lambda:
Type: AWS::Lambda::Function
Properties:
Description: alert function
Handler: index.handler
Role: !GetAtt LambdaExecutionRole.Arn
Code:
S3Bucket: !Ref "S3Bucket"
S3Key: !Ref "S3Key"
Runtime: nodejs6.10
Environment:
Variables:
# override dryRun default for unit testing (e.g. using dotenv)
dryRun: false
slackChannel: !Ref "SlackChannel"
base64HookUrl: !Ref "SlackHookUrlBase64"
Topic:
Type: "AWS::SNS::Topic"
Properties:
DisplayName: "Alert"
Subscription:
-
Endpoint:
!GetAtt "Lambda.Arn"
Protocol: "lambda"
PermissionLambdaInvocation:
Type: "AWS::Lambda::Permission"
Properties:
FunctionName:
!GetAtt "Lambda.Arn"
Action: "lambda:InvokeFunction"
Principal: "sns.amazonaws.com"
SourceArn:
!Ref "Topic"
# Standalone VMs (e.g. bastion hosts, single-instance DBs, etc.)
InstanceCPUAlarm:
Type: AWS::CloudWatch::Alarm
Condition: EC2InstanceIdDefined
Properties:
AlarmName: !Sub "[${Stage}] EC2 instance CPU utilization (${Source})"
AlarmDescription: !Sub "Raise alarm if CPU utilization > ${AlarmThreshold}% for ${AlarmPeriod}s"
MetricName: CPUUtilization
Namespace: AWS/EC2
Statistic: Average
AlarmActions:
- !Ref "Topic"
Period: !Ref "AlarmPeriod"
EvaluationPeriods: !Ref "AlarmEvaluationPeriods"
Threshold: !Ref "AlarmThreshold"
Dimensions:
- Name: "InstanceId"
Value:
!Ref EC2InstanceId
ComparisonOperator: GreaterThanThreshold
# custom metrics
ErrorMetricFilter:
Type: AWS::Logs::MetricFilter
Condition: CWLogGroupDefined
Properties:
LogGroupName:
!Ref "CWLogGroupName"
FilterPattern: !Ref "LogErrorFilter"
MetricTransformations:
- MetricValue: '1'
MetricNamespace: !Sub "${Source}/error"
MetricName: count
# custom metric alarm
LogErrorAlarm:
Type: AWS::CloudWatch::Alarm
Condition: CWLogGroupDefined
Properties:
AlarmName: !Sub "[${Stage}] '${LogErrorFilter}' in CloudWatch Logs (${Source})"
AlarmDescription: !Sub "An error has been raised in ${CWLogGroupName}"
MetricName: count
Namespace: !Sub "${Source}/error"
Statistic: Sum
Period: '60'
EvaluationPeriods: '1'
Threshold: '0'
AlarmActions:
- !Ref "Topic"
ComparisonOperator: GreaterThanThreshold
# ECS alarms: 2x CPU, 2x memory; supply both cluster and service for services
ECSClusterCPUAlarm:
Type: AWS::CloudWatch::Alarm
Condition: ECSClusterDefined
Properties:
AlarmName: !Sub "[${Stage}] ECS cluster CPU utilization (${Source})"
AlarmDescription: !Sub "Raise alarm if CPU utilization > ${AlarmThreshold}% for ${AlarmPeriod}s"
MetricName: CPUUtilization
Namespace: AWS/EC2
Statistic: Average
AlarmActions:
- !Ref "Topic"
Period: !Ref "AlarmPeriod"
EvaluationPeriods: !Ref "AlarmEvaluationPeriods"
Threshold: !Ref "AlarmThreshold"
Dimensions:
- Name: "ClusterName"
Value:
!Ref "ECSClusterName"
ComparisonOperator: GreaterThanThreshold
ECSClusterMemoryAlarm:
Type: AWS::CloudWatch::Alarm
Condition: ECSClusterDefined
Properties:
AlarmName: !Sub "[${Stage}] ECS cluster memory utilization (${Source})"
AlarmDescription: !Sub "Raise alarm if memory utilization > ${AlarmThreshold}% for ${AlarmPeriod}s"
MetricName: MemoryUtilization
Namespace: AWS/EC2
Statistic: Average
AlarmActions:
- !Ref "Topic"
Period: !Ref "AlarmPeriod"
EvaluationPeriods: !Ref "AlarmEvaluationPeriods"
Threshold: !Ref "AlarmThreshold"
Dimensions:
- Name: "ClusterName"
Value:
!Ref "ECSClusterName"
ComparisonOperator: GreaterThanThreshold
ECSServiceCPUAlarm:
Type: AWS::CloudWatch::Alarm
Condition: ECSServiceDefined
Properties:
AlarmName: !Sub "[${Stage}] ECS service CPU utilization (${Source})"
AlarmDescription: !Sub "Raise alarm if CPU utilization > ${AlarmThreshold}% for ${AlarmPeriod}s"
MetricName: CPUUtilization
Namespace: AWS/EC2
Statistic: Average
AlarmActions:
- !Ref "Topic"
Period: !Ref "AlarmPeriod"
EvaluationPeriods: !Ref "AlarmEvaluationPeriods"
Threshold: !Ref "AlarmThreshold"
Dimensions:
- Name: "ClusterName"
Value:
!Ref "ECSClusterName"
- Name: "ServiceName"
Value:
!Ref "ECSServiceName"
ComparisonOperator: GreaterThanThreshold
ECSServiceMemoryAlarm:
Type: AWS::CloudWatch::Alarm
Condition: ECSServiceDefined
Properties:
AlarmName: !Sub "[${Stage}] ECS service memory utilization (${Source})"
AlarmDescription: !Sub "Raise alarm if memory utilization > ${AlarmThreshold}% for ${AlarmPeriod}s"
MetricName: MemoryUtilization
Namespace: AWS/EC2
Statistic: Average
AlarmActions:
- !Ref "Topic"
Period: !Ref "AlarmPeriod"
EvaluationPeriods: !Ref "AlarmEvaluationPeriods"
Threshold: !Ref "AlarmThreshold"
Dimensions:
- Name: "ClusterName"
Value:
!Ref "ECSClusterName"
- Name: "ServiceName"
Value:
!Ref "ECSServiceName"
ComparisonOperator: GreaterThanThreshold
# auto scaling groups (e.g. ECS application nodes)
ASGroupCPUAlarm:
Type: AWS::CloudWatch::Alarm
Condition: ASGroupDefined
Properties:
AlarmName: !Sub "[${Stage}] Auto scaling CPU utilization (${Source})"
AlarmDescription: !Sub "Raise alarm if CPU utilization > ${AlarmThreshold}% for ${AlarmPeriod}s"
MetricName: CPUUtilization
Namespace: AWS/EC2
Statistic: Average
AlarmActions:
- !Ref "Topic"
Period: !Ref "AlarmPeriod"
EvaluationPeriods: !Ref "AlarmEvaluationPeriods"
Threshold: !Ref "AlarmThreshold"
Dimensions:
- Name: "AutoScalingGroupName"
Value:
!Ref ASGroupName
ComparisonOperator: GreaterThanThreshold
Description: Generic alert template
# echo inputs
Outputs:
Stage:
Description: Stage descriptor
Value: !Ref "Stage"
SlackChannel:
Description: Notification channel
Value: !Ref "SlackChannel"
SlackHookUrlBase64:
Description: Predefined hook URL (base64-encoded)
Value: !Ref "SlackHookUrlBase64"
S3Bucket:
Description: Name of S3 Bucket holding alert template and lambda
Value: !Ref "S3Bucket"
S3Key:
Description: Lambda key
Value: !Ref "S3Key"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment