https://docs.aws.amazon.com/de_de/lambda/latest/dg/monitoring-functions-metrics.html
Last active
February 25, 2020 07:22
-
-
Save philschmid/2ea2a2b71b3df3b28c8336d02480e1ca to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Resources: | |
| LambdaExecutionAnomalyDetector: | |
| Type: AWS::CloudWatch::AnomalyDetector | |
| Properties: | |
| MetricName: Duration | |
| Namespace: AWS/Lambda | |
| Stat: Sum | |
| LambdaExecutionAlarm: | |
| Type: AWS::CloudWatch::Alarm | |
| Properties: | |
| AlarmDescription: Lambda invocations | |
| AlarmName: hier-steht-der-name-wie-in-code-build | |
| AlarmActions: | |
| - !Ref "Topic" | |
| ComparisonOperator: LessThanLowerOrGreaterThanUpperThreshold | |
| EvaluationPeriods: 1 | |
| Metrics: | |
| - Expression: ANOMALY_DETECTION_BAND(m1, 2) | |
| Id: ad1 | |
| - Id: m1 | |
| MetricStat: | |
| Metric: | |
| MetricName: Duration | |
| Namespace: AWS/Lambda | |
| Period: !!int 86400 | |
| Stat: Sum | |
| ThresholdMetricId: ad1 | |
| TreatMissingData: breaching | |
| AlarmTopic: | |
| Type: "AWS::SNS::Topic" | |
| Properties: | |
| DisplayName: "Alert" | |
| Subscription: | |
| - | |
| Endpoint: | |
| !GetAtt "Lambda.Arn" | |
| Protocol: "lambda" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| --- | |
| # The alert template is intended to be used as a nested template. | |
| # Assuming you want to add an alert for an EC2 instance and given | |
| # resource ${Host} and params ${Stage}, ${Bucket} and ${Key}, | |
| # you incorporate it in the parent template as follows: | |
| # | |
| # Resources: | |
| # AlertStack: | |
| # Type: "AWS::CloudFormation::Stack" | |
| # Properties: | |
| # Parameters: | |
| # "Source": !Ref "Host" | |
| # "EC2InstanceId": !Ref "Host" | |
| # "Stage": !Ref "Stage" | |
| # TemplateURL: | |
| # !Sub "https://s3-${AWS::Region}.amazonaws.com/${Bucket}/${Key}" | |
| # TimeoutInMinutes: 10 | |
| # | |
| # Note that most fields have workable defaults | |
| AWSTemplateFormatVersion: '2010-09-09' | |
| Parameters: | |
| # optional params - omit any that don't apply | |
| EC2InstanceId: | |
| Description: ID of the monitored host | |
| Type: String | |
| Default: "" | |
| ECSClusterName: | |
| Description: ECS cluster name | |
| Type: String | |
| Default: "" | |
| ECSServiceName: | |
| Description: ECS service name | |
| Type: String | |
| Default: "" | |
| ASGroupName: | |
| Description: Auto scaling group name | |
| Type: String | |
| Default: "" | |
| CWLogGroupName: | |
| Description: CloudWatch log group name | |
| Type: String | |
| Default: "" | |
| # required params | |
| Source: | |
| Description: alarm source's name | |
| Type: String | |
| LogErrorFilter: | |
| Description: search term identifying errors in logs | |
| Type: String | |
| Default: "ERROR" | |
| # Slack integration: stage, channels and URLs | |
| Stage: | |
| Description: Stage name | |
| Type: String | |
| Default: "test" | |
| SlackChannel: | |
| Description: Notification channel | |
| Type: String | |
| Default: alert-template | |
| SlackHookUrlBase64: | |
| Description: Predefined hook URL (base64-encoded) | |
| Type: String | |
| S3Bucket: | |
| Description: Name of S3 Bucket holding alert template and lambda | |
| Type: String | |
| Default: "alert-template" | |
| S3Key: | |
| Description: Lambda key | |
| Type: String | |
| Default: "post-slack-alert-0.1.1.zip" | |
| # shared alarm configuration | |
| AlarmPeriod: | |
| Description: Data evaluation period in seconds | |
| Type: Number | |
| Default: 60 | |
| AlarmEvaluationPeriods: | |
| Description: Group size of data evaluation periods | |
| Type: Number | |
| Default: 1 | |
| AlarmThreshold: | |
| Description: Limit in percent above which alarms are raised | |
| Type: Number | |
| Default: 80 | |
| Conditions: | |
| # only create defined resources | |
| EC2InstanceIdDefined: | |
| !Not [!Equals [!Ref EC2InstanceId, '']] | |
| # only set up cluster alert if no service ID given | |
| ECSClusterDefined: | |
| !And [!Not [!Equals [!Ref ECSClusterName, '']], !Equals [!Ref ECSServiceName, '']] | |
| # only set up service alert if cluster and service IDs given | |
| ECSServiceDefined: | |
| !Not [!Or [!Equals [!Ref ECSServiceName, ''], !Equals [!Ref ECSClusterName, '']]] | |
| ASGroupDefined: | |
| !Not [!Equals [!Ref ASGroupName, '']] | |
| CWLogGroupDefined: | |
| !Not [!Equals [!Ref CWLogGroupName, '']] | |
| Resources: | |
| LambdaExecutionRole: | |
| Type: AWS::IAM::Role | |
| Properties: | |
| AssumeRolePolicyDocument: | |
| Version: "2012-10-17" | |
| Statement: | |
| - Effect: Allow | |
| Principal: | |
| Service: | |
| - lambda.amazonaws.com | |
| Action: | |
| - sts:AssumeRole | |
| Path: "/" | |
| Policies: | |
| - PolicyName: root | |
| PolicyDocument: | |
| Version: "2012-10-17" | |
| Statement: | |
| - Effect: Allow | |
| Action: | |
| - logs:* | |
| Resource: arn:aws:logs:*:*:* | |
| - PolicyName: s3-read-only | |
| PolicyDocument: | |
| Version: "2012-10-17" | |
| Statement: | |
| - Effect: Allow | |
| Action: | |
| - s3:GetObject | |
| Resource: | |
| !Sub "arn:aws:s3:::${S3Bucket}/*" | |
| Lambda: | |
| Type: AWS::Lambda::Function | |
| Properties: | |
| Description: alert function | |
| Handler: index.handler | |
| Role: !GetAtt LambdaExecutionRole.Arn | |
| Code: | |
| S3Bucket: !Ref "S3Bucket" | |
| S3Key: !Ref "S3Key" | |
| Runtime: nodejs6.10 | |
| Environment: | |
| Variables: | |
| # override dryRun default for unit testing (e.g. using dotenv) | |
| dryRun: false | |
| slackChannel: !Ref "SlackChannel" | |
| base64HookUrl: !Ref "SlackHookUrlBase64" | |
| Topic: | |
| Type: "AWS::SNS::Topic" | |
| Properties: | |
| DisplayName: "Alert" | |
| Subscription: | |
| - | |
| Endpoint: | |
| !GetAtt "Lambda.Arn" | |
| Protocol: "lambda" | |
| PermissionLambdaInvocation: | |
| Type: "AWS::Lambda::Permission" | |
| Properties: | |
| FunctionName: | |
| !GetAtt "Lambda.Arn" | |
| Action: "lambda:InvokeFunction" | |
| Principal: "sns.amazonaws.com" | |
| SourceArn: | |
| !Ref "Topic" | |
| # Standalone VMs (e.g. bastion hosts, single-instance DBs, etc.) | |
| InstanceCPUAlarm: | |
| Type: AWS::CloudWatch::Alarm | |
| Condition: EC2InstanceIdDefined | |
| Properties: | |
| AlarmName: !Sub "[${Stage}] EC2 instance CPU utilization (${Source})" | |
| AlarmDescription: !Sub "Raise alarm if CPU utilization > ${AlarmThreshold}% for ${AlarmPeriod}s" | |
| MetricName: CPUUtilization | |
| Namespace: AWS/EC2 | |
| Statistic: Average | |
| AlarmActions: | |
| - !Ref "Topic" | |
| Period: !Ref "AlarmPeriod" | |
| EvaluationPeriods: !Ref "AlarmEvaluationPeriods" | |
| Threshold: !Ref "AlarmThreshold" | |
| Dimensions: | |
| - Name: "InstanceId" | |
| Value: | |
| !Ref EC2InstanceId | |
| ComparisonOperator: GreaterThanThreshold | |
| # custom metrics | |
| ErrorMetricFilter: | |
| Type: AWS::Logs::MetricFilter | |
| Condition: CWLogGroupDefined | |
| Properties: | |
| LogGroupName: | |
| !Ref "CWLogGroupName" | |
| FilterPattern: !Ref "LogErrorFilter" | |
| MetricTransformations: | |
| - MetricValue: '1' | |
| MetricNamespace: !Sub "${Source}/error" | |
| MetricName: count | |
| # custom metric alarm | |
| LogErrorAlarm: | |
| Type: AWS::CloudWatch::Alarm | |
| Condition: CWLogGroupDefined | |
| Properties: | |
| AlarmName: !Sub "[${Stage}] '${LogErrorFilter}' in CloudWatch Logs (${Source})" | |
| AlarmDescription: !Sub "An error has been raised in ${CWLogGroupName}" | |
| MetricName: count | |
| Namespace: !Sub "${Source}/error" | |
| Statistic: Sum | |
| Period: '60' | |
| EvaluationPeriods: '1' | |
| Threshold: '0' | |
| AlarmActions: | |
| - !Ref "Topic" | |
| ComparisonOperator: GreaterThanThreshold | |
| # ECS alarms: 2x CPU, 2x memory; supply both cluster and service for services | |
| ECSClusterCPUAlarm: | |
| Type: AWS::CloudWatch::Alarm | |
| Condition: ECSClusterDefined | |
| Properties: | |
| AlarmName: !Sub "[${Stage}] ECS cluster CPU utilization (${Source})" | |
| AlarmDescription: !Sub "Raise alarm if CPU utilization > ${AlarmThreshold}% for ${AlarmPeriod}s" | |
| MetricName: CPUUtilization | |
| Namespace: AWS/EC2 | |
| Statistic: Average | |
| AlarmActions: | |
| - !Ref "Topic" | |
| Period: !Ref "AlarmPeriod" | |
| EvaluationPeriods: !Ref "AlarmEvaluationPeriods" | |
| Threshold: !Ref "AlarmThreshold" | |
| Dimensions: | |
| - Name: "ClusterName" | |
| Value: | |
| !Ref "ECSClusterName" | |
| ComparisonOperator: GreaterThanThreshold | |
| ECSClusterMemoryAlarm: | |
| Type: AWS::CloudWatch::Alarm | |
| Condition: ECSClusterDefined | |
| Properties: | |
| AlarmName: !Sub "[${Stage}] ECS cluster memory utilization (${Source})" | |
| AlarmDescription: !Sub "Raise alarm if memory utilization > ${AlarmThreshold}% for ${AlarmPeriod}s" | |
| MetricName: MemoryUtilization | |
| Namespace: AWS/EC2 | |
| Statistic: Average | |
| AlarmActions: | |
| - !Ref "Topic" | |
| Period: !Ref "AlarmPeriod" | |
| EvaluationPeriods: !Ref "AlarmEvaluationPeriods" | |
| Threshold: !Ref "AlarmThreshold" | |
| Dimensions: | |
| - Name: "ClusterName" | |
| Value: | |
| !Ref "ECSClusterName" | |
| ComparisonOperator: GreaterThanThreshold | |
| ECSServiceCPUAlarm: | |
| Type: AWS::CloudWatch::Alarm | |
| Condition: ECSServiceDefined | |
| Properties: | |
| AlarmName: !Sub "[${Stage}] ECS service CPU utilization (${Source})" | |
| AlarmDescription: !Sub "Raise alarm if CPU utilization > ${AlarmThreshold}% for ${AlarmPeriod}s" | |
| MetricName: CPUUtilization | |
| Namespace: AWS/EC2 | |
| Statistic: Average | |
| AlarmActions: | |
| - !Ref "Topic" | |
| Period: !Ref "AlarmPeriod" | |
| EvaluationPeriods: !Ref "AlarmEvaluationPeriods" | |
| Threshold: !Ref "AlarmThreshold" | |
| Dimensions: | |
| - Name: "ClusterName" | |
| Value: | |
| !Ref "ECSClusterName" | |
| - Name: "ServiceName" | |
| Value: | |
| !Ref "ECSServiceName" | |
| ComparisonOperator: GreaterThanThreshold | |
| ECSServiceMemoryAlarm: | |
| Type: AWS::CloudWatch::Alarm | |
| Condition: ECSServiceDefined | |
| Properties: | |
| AlarmName: !Sub "[${Stage}] ECS service memory utilization (${Source})" | |
| AlarmDescription: !Sub "Raise alarm if memory utilization > ${AlarmThreshold}% for ${AlarmPeriod}s" | |
| MetricName: MemoryUtilization | |
| Namespace: AWS/EC2 | |
| Statistic: Average | |
| AlarmActions: | |
| - !Ref "Topic" | |
| Period: !Ref "AlarmPeriod" | |
| EvaluationPeriods: !Ref "AlarmEvaluationPeriods" | |
| Threshold: !Ref "AlarmThreshold" | |
| Dimensions: | |
| - Name: "ClusterName" | |
| Value: | |
| !Ref "ECSClusterName" | |
| - Name: "ServiceName" | |
| Value: | |
| !Ref "ECSServiceName" | |
| ComparisonOperator: GreaterThanThreshold | |
| # auto scaling groups (e.g. ECS application nodes) | |
| ASGroupCPUAlarm: | |
| Type: AWS::CloudWatch::Alarm | |
| Condition: ASGroupDefined | |
| Properties: | |
| AlarmName: !Sub "[${Stage}] Auto scaling CPU utilization (${Source})" | |
| AlarmDescription: !Sub "Raise alarm if CPU utilization > ${AlarmThreshold}% for ${AlarmPeriod}s" | |
| MetricName: CPUUtilization | |
| Namespace: AWS/EC2 | |
| Statistic: Average | |
| AlarmActions: | |
| - !Ref "Topic" | |
| Period: !Ref "AlarmPeriod" | |
| EvaluationPeriods: !Ref "AlarmEvaluationPeriods" | |
| Threshold: !Ref "AlarmThreshold" | |
| Dimensions: | |
| - Name: "AutoScalingGroupName" | |
| Value: | |
| !Ref ASGroupName | |
| ComparisonOperator: GreaterThanThreshold | |
| Description: Generic alert template | |
| # echo inputs | |
| Outputs: | |
| Stage: | |
| Description: Stage descriptor | |
| Value: !Ref "Stage" | |
| SlackChannel: | |
| Description: Notification channel | |
| Value: !Ref "SlackChannel" | |
| SlackHookUrlBase64: | |
| Description: Predefined hook URL (base64-encoded) | |
| Value: !Ref "SlackHookUrlBase64" | |
| S3Bucket: | |
| Description: Name of S3 Bucket holding alert template and lambda | |
| Value: !Ref "S3Bucket" | |
| S3Key: | |
| Description: Lambda key | |
| Value: !Ref "S3Key" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment