-
-
Save kichik/7a2ecb0d36358c50c7b878ad9fd982bc to your computer and use it in GitHub Desktop.
# aws cloudformation deploy --template-file KeepDbStopped.yml --stack-name stop-db --capabilities CAPABILITY_IAM --parameter-overrides DB=arn:aws:rds:us-east-1:XXX:db:XXX | |
Description: Automatically stop RDS instance every time it turns on due to exceeding the maximum allowed time being stopped | |
Parameters: | |
DB: | |
Description: ARN of database that needs to be stopped | |
Type: String | |
AllowedPattern: arn:aws:rds:[a-z0-9\-]+:[0-9]+:db:[^:]* | |
MaxStartupTime: | |
Description: Maximum number of minutes to wait between database is automatically started and the time it's ready to be shut down. Extend this limit if your database takes a long time to boot up. | |
Type: Number | |
MinValue: 10 | |
Default: 25 | |
Resources: | |
DatabaseStopperFunction: | |
Type: AWS::Lambda::Function | |
Properties: | |
Role: !GetAtt DatabaseStopperRole.Arn | |
Runtime: python3.6 | |
Handler: index.handler | |
Timeout: 20 | |
Code: | |
ZipFile: | |
Fn::Sub: | | |
import boto3 | |
import time | |
def handler(event, context): | |
print("got", event) | |
db = event["detail"]["SourceArn"] | |
id = event["detail"]["SourceIdentifier"] | |
message = event["detail"]["Message"] | |
region = event["region"] | |
rds = boto3.client("rds", region_name=region) | |
if message == "DB instance is being started due to it exceeding the maximum allowed time being stopped.": | |
print("database turned on automatically, setting last seen tag...") | |
last_seen = int(time.time()) | |
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": str(last_seen)}]) | |
elif message == "DB instance started": | |
print("database started (and sort of available?)") | |
last_seen = 0 | |
for t in rds.list_tags_for_resource(ResourceName=db)["TagList"]: | |
if t["Key"] == "DbStopperLastSeen": | |
last_seen = int(t["Value"]) | |
if time.time() < last_seen + (60 * ${MaxStartupTime}): | |
print("database was automatically started in the last ${MaxStartupTime} minutes, turning off...") | |
time.sleep(10) # even waiting for the "started" event is not enough, so add some wait | |
rds.stop_db_instance(DBInstanceIdentifier=id) | |
print("success! removing auto-start tag...") | |
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": "0"}]) | |
else: | |
print("ignoring manual database start") | |
else: | |
print("error: unknown database event!") | |
DatabaseStopperRole: | |
Type: AWS::IAM::Role | |
Properties: | |
AssumeRolePolicyDocument: | |
Version: '2012-10-17' | |
Statement: | |
- Action: | |
- sts:AssumeRole | |
Effect: Allow | |
Principal: | |
Service: | |
- lambda.amazonaws.com | |
ManagedPolicyArns: | |
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole | |
Policies: | |
- PolicyName: Notify | |
PolicyDocument: | |
Version: '2012-10-17' | |
Statement: | |
- Action: | |
- rds:StopDBInstance | |
Effect: Allow | |
Resource: !Ref DB | |
- Action: | |
- rds:AddTagsToResource | |
- rds:ListTagsForResource | |
- rds:RemoveTagsFromResource | |
Effect: Allow | |
Resource: !Ref DB | |
Condition: | |
ForAllValues:StringEquals: | |
aws:TagKeys: | |
- DbStopperLastSeen | |
DatabaseStopperPermission: | |
Type: AWS::Lambda::Permission | |
Properties: | |
Action: lambda:InvokeFunction | |
FunctionName: !GetAtt DatabaseStopperFunction.Arn | |
Principal: events.amazonaws.com | |
SourceArn: !GetAtt DatabaseStopperRule.Arn | |
DatabaseStopperRule: | |
Type: AWS::Events::Rule | |
Properties: | |
EventPattern: | |
source: | |
- aws.rds | |
detail-type: | |
- "RDS DB Instance Event" | |
resources: | |
- !Ref DB | |
detail: | |
Message: | |
- "DB instance is being started due to it exceeding the maximum allowed time being stopped." | |
- "DB instance started" | |
Targets: | |
- Arn: !GetAtt DatabaseStopperFunction.Arn | |
Id: DatabaseStopperLambda |
@Klohto doh, of course! Thanks, updated.
Hi, I have successfully deployed this to cloud formation but unfortunately it did not stop my RDS instance after 7 days of sleeping. I have only changed the stack-name and rds arn parameters but the rest is the same.
I have checked the logs. The db stopper lambda executed 3 times between 6:18pm to 6:22pm. But the database was auto-started at 6:24pm, which is 2 minutes later. Therefore, the logs for the lambda is such:
An error occurred (InvalidDBInstanceState) when calling the StopDBInstance operation: Instance my-tmp-db is not in available state.: InvalidDBInstanceStateFault
The CloudFormation executed the lambda right before the RDS instance is fully started and entered "available" state. Have you ever encountered such a problem? Is there any way to be able to execute this lambda right after the RDS state is changed to "available"? And do you know why this lambda is executed for 3 times?
@cagriar when Lambdas fail, they try again 2 more times for a total of 3 times. I'm afraid you've hit the TODO on line 24. One very naive solution would be setting Timeout: 900
on the Lambda and adding time.sleep(600)
in the handler.
But maybe I can do one better. Can you paste the recent events of your RDS? Should be AWS Console -> RDS -> Databases -> [your database] -> Logs & events -> Recent events.
@kichik I have modified the DatabaseStopperRule
such as:
DatabaseStopperRule:
Type: AWS::Events::Rule
Properties:
EventPattern:
source:
- aws.rds
detail-type:
- "RDS DB Instance Event"
resources:
- !Ref DB
detail:
EventCategories:
- "availability"
- "notification"
Targets:
- Arn: !GetAtt DatabaseStopperFunction.Arn
Id: DatabaseStopperLambda
When I have manually start my database, this works successfully by stopping it. Now I'm waiting for 7 days so that I can test it with the auto-start mechanism of AWS. I have one other question, where did you configure so that the lambdas are executed 2 more times if it fail? I could not find that in your script. Is it the general behavior?
I want to have it only shutdown the database if it was turned on automatically. This way users can still turn it on manually. So I want to catch both events and only respond to the second event saying the database was started if the first event saying it was because the 7 days limit happened. For that it would be really useful if you can paste the whole event log once the 7 days pass.
Lambda executing three times is an internal Lambda feature you can't turn off.
OK I have changed the DatabaseStopperRule
such as:
DatabaseStopperRule:
Type: AWS::Events::Rule
Properties:
EventPattern:
source:
- aws.rds
detail-type:
- "RDS DB Instance Event"
resources:
- !Ref DB
detail:
EventCategories:
- "notification"
Targets:
- Arn: !GetAtt DatabaseStopperFunction.Arn
Id: DatabaseStopperLambda
Now it just listens to the notification events, hence will only stopped if it's started automatically. I don't know why but my RDS logs are empty. I have placed print(event)
inside my lambda, so that I will send you the logs on friday, once the DB is auto-started after 7 days.
@kichik The modification I made (adding notification
as EventCategories
) in the last message seems working. My lambda function was called 3*3=9 times with 3 separate events, which are:
- 'Message': 'DB instance is being started due to it exceeding the maximum allowed time being stopped.'
- 'Message': 'DB instance started' (lambda stops the started database in the 3rd try)
- 'Message': 'DB instance stopped'
So I guess it works. Thanks.
Thanks. I updated it to wait for the "maximum allowed" message, set a tag, wait for the "started" message, see if the tag was set, and only then stop it after waiting 10 seconds. Hopefully this should be enough to cover all the cases.
@kichik Thanks for the update. I have 2 questions:
- Why do you need to check whether the database was automatically started in the last 20 minutes or not? Did you do that to separate the logic between auto-started database and manually-started database?
- Are you sure 10 seconds sleep is enough? If it fails, the lambda function will be executed 2 more times again as usual, right?
Yes, this is to allow people to manually turn on the database when they do need it without deleting this stack.
10 seconds worked in my limited tests. It will try 3 times so technically a bit more than 30 seconds total. We can add a configurable value if people report it doesn't work.
@kichik Thanks for that! I will try it and let you know with the results next week.
@kichik Thanks, your updates worked successfully!
Hi, cool script! Just some thoughts
@kichik @cagriar Have you guys ever come across an RDS instance taking longer than 20 minutes to start (after AWS auto starts it)? Say 25 minutes.
In that case, this script will falsely exclude it from being stopped, right?
I was thinking maybe these lines:
https://gist.github.com/kichik/7a2ecb0d36358c50c7b878ad9fd982bc#file-keepdbstopped-yml-L43-L52
can be modified to:
tags = rds.list_tags_for_resource(ResourceName=db)["TagList"]
if is_started_by_aws(tags):
print("database was automatically started, turning off...")
time.sleep(10)
# even waiting for the "started" event is not enough, so add some wait
rds.stop_db_instance(DBInstanceIdentifier=id)
print("success! removing auto-start tag...")
rds.remove_tags_from_resource(ResourceName=db, TagKeys=["DbStopperLastSeen"])
else:
print("ignoring manual database start")
def is_started_by_aws(tags):
"""
Checks if a RDS instance was auto started by AWS
:param tags: List of Tags configured for RDS instance
:return: False if the resource has "DbStopperLastSeen" tag, True otherwise
"""
for tag in tags:
if tag["Key"].lower() == "DbStopperLastSeen".lower():
return True
return False
This way:
- It won't matter how long the RDS instance takes to be available.
- We don't need to use the
time.time()
UNIX timestamp value stored in last seen tag. It will just be an identifier added by the earlier lambda run.
Thoughts?
I wanted to put a time limit on it in case we ever miss the message, the user turns off the DB before we do, the tag fails to set, or basically any unforeseen issue. I'll turn it into a stack parameter, but I still want to keep it in place.
Please notice that for Aurora this will not work and you need to operate on the cluster level.
@kichik thoughts?
Here is the modified (not yet tested) version:
# aws cloudformation deploy --template-file KeepDbStopped.yml --stack-name stop-db --capabilities CAPABILITY_IAM --parameter-overrides DBCluster=arn:aws:rds:us-east-1:XXX:cluster:XXX
Description: Automatically stop RDS Aurora cluster every time it turns on due to exceeding the maximum allowed time being stopped
Parameters:
DBCluster:
Description: ARN of database cluster that needs to be stopped
Type: String
AllowedPattern: arn:aws:rds:[a-z0-9\-]+:[0-9]+:cluster:[^:]*
MaxStartupTime:
Description: Maximum number of minutes to wait between database is automatically started and the time it's ready to be shut down. Extend this limit if your database takes a long time to boot up.
Type: Number
MinValue: 10
Default: 25
Resources:
DatabaseStopperFunction:
Type: AWS::Lambda::Function
Properties:
Role: !GetAtt DatabaseStopperRole.Arn
Runtime: python3.6
Handler: index.handler
Timeout: 20
Code:
ZipFile:
Fn::Sub: |
import boto3
import time
def handler(event, context):
print("got", event)
db = event["detail"]["SourceArn"]
id = event["detail"]["SourceIdentifier"]
message = event["detail"]["Message"]
region = event["region"]
rds = boto3.client("rds", region_name=region)
if message == "Cluster instance is being started due to it exceeding the maximum allowed time being stopped.":
print("database turned on automatically, setting last seen tag...")
last_seen = int(time.time())
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": str(last_seen)}])
elif message == "Cluster instance started":
print("database started (and sort of available?)")
last_seen = 0
for t in rds.list_tags_for_resource(ResourceName=db)["TagList"]:
if t["Key"] == "DbStopperLastSeen":
last_seen = int(t["Value"])
if time.time() < last_seen + (60 * ${MaxStartupTime}):
print("database was automatically started in the last ${MaxStartupTime} minutes, turning off...")
time.sleep(10) # even waiting for the "started" event is not enough, so add some wait
rds.stop_db_cluster(DBClusterIdentifier=id)
print("success! removing auto-start tag...")
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": "0"}])
else:
print("ignoring manual database start")
else:
print("error: unknown database event!")
DatabaseStopperRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- sts:AssumeRole
Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: Notify
PolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- rds:StopDBCluster
Effect: Allow
Resource: !Ref DBCluster
- Action:
- rds:AddTagsToResource
- rds:ListTagsForResource
- rds:RemoveTagsFromResource
Effect: Allow
Resource: !Ref DBCluster
Condition:
ForAllValues:StringEquals:
aws:TagKeys:
- DbStopperLastSeen
DatabaseStopperPermission:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:InvokeFunction
FunctionName: !GetAtt DatabaseStopperFunction.Arn
Principal: events.amazonaws.com
SourceArn: !GetAtt DatabaseStopperRule.Arn
DatabaseStopperRule:
Type: AWS::Events::Rule
Properties:
EventPattern:
source:
- aws.rds
detail-type:
- "RDS Cluster Instance Event"
resources:
- !Ref DBCluster
detail:
Message:
- "Cluster instance is being started due to it exceeding the maximum allowed time being stopped."
- "Cluster instance started"
Targets:
- Arn: !GetAtt DatabaseStopperFunction.Arn
Id: DatabaseStopperLambda
@regevbr your code looks like it would work. Thanks! I think using Aurora Serverless might work too.
@kichik it doesnt work for me when I run a test event in Lambda. I get this error
17 Feb 2022 14:09 [INFO] (/var/runtime/bootstrap.py) main started at epoch 1645106978694
17 Feb 2022 14:09 [INFO] (/var/runtime/bootstrap.py) init completed at epoch 1645106978694
got {'key1: 'value1', 'key2': 'value2', 'key3': 'value3'}
'SourceArn' : KeyError
Traceback (most recent call last):
File "/var/task/index.py", line 6, in handler
db = event["SourceArn"]
KeyError: 'SourceArn'
This also happens for ["detail"] in db = event["detail"]["SourceArn"]
I have only run this through Lambda as a 'Test' I configured on the Lambda. I have not tested this yet by using the Event Rule that listens for the message.
In the AWS cli if I run the command rds describe-events against my RDS Cluster I can see the following under 'Events' SourceIdentifier, SourceType, SourceArn and Message
Run it against actual event test data.
Do you see {'key1: 'value1', 'key2': 'value2', 'key3': 'value3'}
containing SourceArn
?
detail: Message: - "The DB instance is being started due to it exceeding the maximum allowed time being stopped." # TODO something is off about the pattern as this never gets triggered
It's due to a wrong pattern. Omit
The
and it will work.The correct pattern is:
DB instance is being started due to it exceeding the maximum allowed time being stopped.
Works for me :)