-
-
Save kichik/7a2ecb0d36358c50c7b878ad9fd982bc to your computer and use it in GitHub Desktop.
# aws cloudformation deploy --template-file KeepDbStopped.yml --stack-name stop-db --capabilities CAPABILITY_IAM --parameter-overrides DB=arn:aws:rds:us-east-1:XXX:db:XXX | |
Description: Automatically stop RDS instance every time it turns on due to exceeding the maximum allowed time being stopped | |
Parameters: | |
DB: | |
Description: ARN of database that needs to be stopped | |
Type: String | |
AllowedPattern: arn:aws:rds:[a-z0-9\-]+:[0-9]+:db:[^:]* | |
MaxStartupTime: | |
Description: Maximum number of minutes to wait between database is automatically started and the time it's ready to be shut down. Extend this limit if your database takes a long time to boot up. | |
Type: Number | |
MinValue: 10 | |
Default: 25 | |
Resources: | |
DatabaseStopperFunction: | |
Type: AWS::Lambda::Function | |
Properties: | |
Role: !GetAtt DatabaseStopperRole.Arn | |
Runtime: python3.6 | |
Handler: index.handler | |
Timeout: 20 | |
Code: | |
ZipFile: | |
Fn::Sub: | | |
import boto3 | |
import time | |
def handler(event, context): | |
print("got", event) | |
db = event["detail"]["SourceArn"] | |
id = event["detail"]["SourceIdentifier"] | |
message = event["detail"]["Message"] | |
region = event["region"] | |
rds = boto3.client("rds", region_name=region) | |
if message == "DB instance is being started due to it exceeding the maximum allowed time being stopped.": | |
print("database turned on automatically, setting last seen tag...") | |
last_seen = int(time.time()) | |
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": str(last_seen)}]) | |
elif message == "DB instance started": | |
print("database started (and sort of available?)") | |
last_seen = 0 | |
for t in rds.list_tags_for_resource(ResourceName=db)["TagList"]: | |
if t["Key"] == "DbStopperLastSeen": | |
last_seen = int(t["Value"]) | |
if time.time() < last_seen + (60 * ${MaxStartupTime}): | |
print("database was automatically started in the last ${MaxStartupTime} minutes, turning off...") | |
time.sleep(10) # even waiting for the "started" event is not enough, so add some wait | |
rds.stop_db_instance(DBInstanceIdentifier=id) | |
print("success! removing auto-start tag...") | |
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": "0"}]) | |
else: | |
print("ignoring manual database start") | |
else: | |
print("error: unknown database event!") | |
DatabaseStopperRole: | |
Type: AWS::IAM::Role | |
Properties: | |
AssumeRolePolicyDocument: | |
Version: '2012-10-17' | |
Statement: | |
- Action: | |
- sts:AssumeRole | |
Effect: Allow | |
Principal: | |
Service: | |
- lambda.amazonaws.com | |
ManagedPolicyArns: | |
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole | |
Policies: | |
- PolicyName: Notify | |
PolicyDocument: | |
Version: '2012-10-17' | |
Statement: | |
- Action: | |
- rds:StopDBInstance | |
Effect: Allow | |
Resource: !Ref DB | |
- Action: | |
- rds:AddTagsToResource | |
- rds:ListTagsForResource | |
- rds:RemoveTagsFromResource | |
Effect: Allow | |
Resource: !Ref DB | |
Condition: | |
ForAllValues:StringEquals: | |
aws:TagKeys: | |
- DbStopperLastSeen | |
DatabaseStopperPermission: | |
Type: AWS::Lambda::Permission | |
Properties: | |
Action: lambda:InvokeFunction | |
FunctionName: !GetAtt DatabaseStopperFunction.Arn | |
Principal: events.amazonaws.com | |
SourceArn: !GetAtt DatabaseStopperRule.Arn | |
DatabaseStopperRule: | |
Type: AWS::Events::Rule | |
Properties: | |
EventPattern: | |
source: | |
- aws.rds | |
detail-type: | |
- "RDS DB Instance Event" | |
resources: | |
- !Ref DB | |
detail: | |
Message: | |
- "DB instance is being started due to it exceeding the maximum allowed time being stopped." | |
- "DB instance started" | |
Targets: | |
- Arn: !GetAtt DatabaseStopperFunction.Arn | |
Id: DatabaseStopperLambda |
@kichik Thanks for that! I will try it and let you know with the results next week.
@kichik Thanks, your updates worked successfully!
Hi, cool script! Just some thoughts
@kichik @cagriar Have you guys ever come across an RDS instance taking longer than 20 minutes to start (after AWS auto starts it)? Say 25 minutes.
In that case, this script will falsely exclude it from being stopped, right?
I was thinking maybe these lines:
https://gist.github.com/kichik/7a2ecb0d36358c50c7b878ad9fd982bc#file-keepdbstopped-yml-L43-L52
can be modified to:
tags = rds.list_tags_for_resource(ResourceName=db)["TagList"]
if is_started_by_aws(tags):
print("database was automatically started, turning off...")
time.sleep(10)
# even waiting for the "started" event is not enough, so add some wait
rds.stop_db_instance(DBInstanceIdentifier=id)
print("success! removing auto-start tag...")
rds.remove_tags_from_resource(ResourceName=db, TagKeys=["DbStopperLastSeen"])
else:
print("ignoring manual database start")
def is_started_by_aws(tags):
"""
Checks if a RDS instance was auto started by AWS
:param tags: List of Tags configured for RDS instance
:return: False if the resource has "DbStopperLastSeen" tag, True otherwise
"""
for tag in tags:
if tag["Key"].lower() == "DbStopperLastSeen".lower():
return True
return False
This way:
- It won't matter how long the RDS instance takes to be available.
- We don't need to use the
time.time()
UNIX timestamp value stored in last seen tag. It will just be an identifier added by the earlier lambda run.
Thoughts?
I wanted to put a time limit on it in case we ever miss the message, the user turns off the DB before we do, the tag fails to set, or basically any unforeseen issue. I'll turn it into a stack parameter, but I still want to keep it in place.
Please notice that for Aurora this will not work and you need to operate on the cluster level.
@kichik thoughts?
Here is the modified (not yet tested) version:
# aws cloudformation deploy --template-file KeepDbStopped.yml --stack-name stop-db --capabilities CAPABILITY_IAM --parameter-overrides DBCluster=arn:aws:rds:us-east-1:XXX:cluster:XXX
Description: Automatically stop RDS Aurora cluster every time it turns on due to exceeding the maximum allowed time being stopped
Parameters:
DBCluster:
Description: ARN of database cluster that needs to be stopped
Type: String
AllowedPattern: arn:aws:rds:[a-z0-9\-]+:[0-9]+:cluster:[^:]*
MaxStartupTime:
Description: Maximum number of minutes to wait between database is automatically started and the time it's ready to be shut down. Extend this limit if your database takes a long time to boot up.
Type: Number
MinValue: 10
Default: 25
Resources:
DatabaseStopperFunction:
Type: AWS::Lambda::Function
Properties:
Role: !GetAtt DatabaseStopperRole.Arn
Runtime: python3.6
Handler: index.handler
Timeout: 20
Code:
ZipFile:
Fn::Sub: |
import boto3
import time
def handler(event, context):
print("got", event)
db = event["detail"]["SourceArn"]
id = event["detail"]["SourceIdentifier"]
message = event["detail"]["Message"]
region = event["region"]
rds = boto3.client("rds", region_name=region)
if message == "Cluster instance is being started due to it exceeding the maximum allowed time being stopped.":
print("database turned on automatically, setting last seen tag...")
last_seen = int(time.time())
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": str(last_seen)}])
elif message == "Cluster instance started":
print("database started (and sort of available?)")
last_seen = 0
for t in rds.list_tags_for_resource(ResourceName=db)["TagList"]:
if t["Key"] == "DbStopperLastSeen":
last_seen = int(t["Value"])
if time.time() < last_seen + (60 * ${MaxStartupTime}):
print("database was automatically started in the last ${MaxStartupTime} minutes, turning off...")
time.sleep(10) # even waiting for the "started" event is not enough, so add some wait
rds.stop_db_cluster(DBClusterIdentifier=id)
print("success! removing auto-start tag...")
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": "0"}])
else:
print("ignoring manual database start")
else:
print("error: unknown database event!")
DatabaseStopperRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- sts:AssumeRole
Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: Notify
PolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- rds:StopDBCluster
Effect: Allow
Resource: !Ref DBCluster
- Action:
- rds:AddTagsToResource
- rds:ListTagsForResource
- rds:RemoveTagsFromResource
Effect: Allow
Resource: !Ref DBCluster
Condition:
ForAllValues:StringEquals:
aws:TagKeys:
- DbStopperLastSeen
DatabaseStopperPermission:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:InvokeFunction
FunctionName: !GetAtt DatabaseStopperFunction.Arn
Principal: events.amazonaws.com
SourceArn: !GetAtt DatabaseStopperRule.Arn
DatabaseStopperRule:
Type: AWS::Events::Rule
Properties:
EventPattern:
source:
- aws.rds
detail-type:
- "RDS Cluster Instance Event"
resources:
- !Ref DBCluster
detail:
Message:
- "Cluster instance is being started due to it exceeding the maximum allowed time being stopped."
- "Cluster instance started"
Targets:
- Arn: !GetAtt DatabaseStopperFunction.Arn
Id: DatabaseStopperLambda
@regevbr your code looks like it would work. Thanks! I think using Aurora Serverless might work too.
@kichik it doesnt work for me when I run a test event in Lambda. I get this error
17 Feb 2022 14:09 [INFO] (/var/runtime/bootstrap.py) main started at epoch 1645106978694
17 Feb 2022 14:09 [INFO] (/var/runtime/bootstrap.py) init completed at epoch 1645106978694
got {'key1: 'value1', 'key2': 'value2', 'key3': 'value3'}
'SourceArn' : KeyError
Traceback (most recent call last):
File "/var/task/index.py", line 6, in handler
db = event["SourceArn"]
KeyError: 'SourceArn'
This also happens for ["detail"] in db = event["detail"]["SourceArn"]
I have only run this through Lambda as a 'Test' I configured on the Lambda. I have not tested this yet by using the Event Rule that listens for the message.
In the AWS cli if I run the command rds describe-events against my RDS Cluster I can see the following under 'Events' SourceIdentifier, SourceType, SourceArn and Message
Run it against actual event test data.
Do you see {'key1: 'value1', 'key2': 'value2', 'key3': 'value3'}
containing SourceArn
?
Yes, this is to allow people to manually turn on the database when they do need it without deleting this stack.
10 seconds worked in my limited tests. It will try 3 times so technically a bit more than 30 seconds total. We can add a configurable value if people report it doesn't work.