Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save tridungle/d1061f98d41b65d83dec2ec606980c72 to your computer and use it in GitHub Desktop.
Save tridungle/d1061f98d41b65d83dec2ec606980c72 to your computer and use it in GitHub Desktop.
AWSTemplateFormatVersion: "2010-09-09"
Transform: "AWS::Serverless-2016-10-31"
Description: >
CloudFormation in Action: An example with seven AWS services (SNS, SQS, Lambda, Kinesis, S3, Glue and Athena)
Use case: How to set-up an environment in AWS to perform data analysis on the daily numbers of Covid-19?
Author: Muttalip Kucuk
Date: 01-11-2020
Globals:
Function:
Runtime: java11
Timeout: 30
MemorySize: 256
Resources:
UpdateTopic:
Type: AWS::SNS::Topic
Properties:
TopicName: "update-topic"
UpdateSubscription:
Type: AWS::SNS::Subscription
Properties:
Endpoint: !GetAtt "UpdateQueue.Arn"
Protocol: sqs
RawMessageDelivery: true
TopicArn: !Ref "UpdateTopic"
UpdateQueue:
Type: AWS::SQS::Queue
Properties:
QueueName: "update-queue"
RedrivePolicy:
deadLetterTargetArn: !GetAtt UpdateDLQ.Arn
maxReceiveCount: 5
UpdateDLQ:
Type: AWS::SQS::Queue
Properties:
QueueName: "update-dlq"
SnsToSqsPolicy:
Type: AWS::SQS::QueuePolicy
Properties:
Queues:
- Ref: UpdateQueue
PolicyDocument:
Id: QueuePolicy
Version: "2012-10-17"
Statement:
- Action:
- "sqs:SendMessage"
Effect: Allow
Resource: !GetAtt UpdateQueue.Arn
Principal:
AWS: "*"
UpdateProcessor:
Type: AWS::Serverless::Function
Properties:
CodeUri: .
Description: Function which first transforms the daily update to the data model specified in the Glue table definition and then send it to Firehose
Environment:
Variables:
UPDATE_DELIVERY_STREAM: !Ref UpdateDeliveryStream
Events:
SqsEvent:
Type: SQS
Properties:
Queue: !GetAtt UpdateQueue.Arn
BatchSize: 10
Enabled: true
FunctionName: "update-processor"
Handler: nl.kucuktechnology.covid19.UpdateProcessor::handleRequest
Policies:
- AWSLambdaBasicExecutionRole
- Statement:
- Action:
- "sqs:*"
Effect: Allow
Resource:
- !GetAtt UpdateQueue.Arn
- !GetAtt UpdateDLQ.Arn
- Action:
- "firehose:PutRecordBatch"
Effect: Allow
Resource: !GetAtt UpdateDeliveryStream.Arn
UpdateDeliveryStream:
Type: AWS::KinesisFirehose::DeliveryStream
Properties:
DeliveryStreamName: "update-stream"
DeliveryStreamType: DirectPut
ExtendedS3DestinationConfiguration:
BucketARN: !GetAtt UpdateBucket.Arn
BufferingHints:
SizeInMBs: 128
IntervalInSeconds: 60
CompressionFormat: UNCOMPRESSED
DataFormatConversionConfiguration:
SchemaConfiguration:
CatalogId: !Ref AWS::AccountId
RoleARN: !GetAtt FirehoseToS3Role.Arn
DatabaseName: !Ref UpdateDatabase
TableName: !Ref UpdateTable
Region: !Ref AWS::Region
VersionId: LATEST
InputFormatConfiguration:
Deserializer:
OpenXJsonSerDe: { }
OutputFormatConfiguration:
Serializer:
ParquetSerDe: { }
Enabled: True
Prefix: "!{timestamp:yyyy}/!{timestamp:MM}/!{timestamp:dd}/"
RoleARN: !GetAtt FirehoseToS3Role.Arn
S3BackupMode: Disabled
UpdateBucket:
Type: AWS::S3::Bucket
DeletionPolicy: Delete
Properties:
BucketName: "update"
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
FirehoseToS3Role:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Sid: ''
Effect: Allow
Principal:
Service: firehose.amazonaws.com
Action: 'sts:AssumeRole'
Policies:
- PolicyName: firehose_delivery_policy_new
PolicyDocument:
Version: 2012-10-17
Statement:
- Action:
- 's3:AbortMultipartUpload'
- 's3:GetBucketLocation'
- 's3:GetObject'
- 's3:ListBucket'
- 's3:ListBucketMultipartUploads'
- 's3:PutObject'
Effect: Allow
Resource: '*'
- Action: 'glue:GetTableVersions'
Effect: Allow
Resource: '*'
FirehoseToS3Policy:
Type: AWS::IAM::Policy
Properties:
PolicyName: firehose_delivery_policy
PolicyDocument:
Version: 2012-10-17
Statement:
- Action:
- 's3:AbortMultipartUpload'
- 's3:GetBucketLocation'
- 's3:GetObject'
- 's3:ListBucket'
- 's3:ListBucketMultipartUploads'
- 's3:PutObject'
Effect: Allow
Resource:
- !Join
- ''
- - 'arn:aws:s3:::'
- !Ref UpdateBucket
- !Join
- ''
- - 'arn:aws:s3:::'
- !Ref UpdateBucket
- '*'
Roles:
- !Ref FirehoseToS3Role
CrawlerInS3Role:
Type: AWS::IAM::Role
Properties:
RoleName: "crawler-role"
AssumeRolePolicyDocument:
Statement:
- Action: sts:AssumeRole
Effect: Allow
Principal:
Service:
- glue.amazonaws.com
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
Policies:
- PolicyDocument:
Statement:
- Action:
- s3:GetObject*
- s3:PutObject*
Effect: Allow
Resource: !Join
- ''
- - !GetAtt UpdateBucket.Arn
- '/*'
PolicyName: "crawler-policy"
GlueCrawler:
Type: AWS::Glue::Crawler
Properties:
Configuration: "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"}}}"
DatabaseName: !Ref UpdateDatabase
Name: "event-crawler"
Role: !GetAtt CrawlerInS3Role.Arn
Schedule:
ScheduleExpression: cron(0 * * * ? *)
SchemaChangePolicy:
UpdateBehavior: "LOG"
DeleteBehavior: "LOG"
Targets:
CatalogTargets:
- DatabaseName: !Ref UpdateDatabase
Tables:
- !Ref UpdateTable
UpdateDatabase:
Type: AWS::Glue::Database
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseInput:
Name: "update-db"
UpdateTable:
Type: AWS::Glue::Table
Properties:
CatalogId: !Ref AWS::AccountId
DatabaseName: !Ref UpdateDatabase
TableInput:
Description: Glue table containing all the updates which used the S3 bucket as data source
Owner: owner
Retention: 0
Name: "updates"
StorageDescriptor:
Location: !Sub 's3://${UpdateBucket}/'
Columns:
- Name: date
Type: date
- Name: country
Type: string
- Name: cases
Type: int
- Name: deaths
Type: int
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: false
NumberOfBuckets: -1
SerdeInfo:
SerializationLibrary: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
Parameters:
serialization.format: '1'
BucketColumns: [ ]
SortColumns: [ ]
StoredAsSubDirectories: false
PartitionKeys:
- Name: year
Type: string
- Name: month
Type: string
- Name: day
Type: string
TableType: EXTERNAL_TABLE
QueryResults:
Type: AWS::S3::Bucket
DeletionPolicy: Delete
Properties:
BucketName: "query-results"
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
Outputs:
UpdateTopicArn:
Description: "Update topic ARN"
Value: !Ref UpdateTopic
UpdateBucketName:
Description: "Update bucket name"
Value: !Ref UpdateBucket
GlueCrawler:
Description: "Glue crawler name"
Value: !Ref GlueCrawler
UpdateDatabaseName:
Description: "Update database name"
Value: !Ref UpdateDatabase
UpdateTableName:
Description: "Update table name"
Value: !Ref UpdateTable
QueryResultsName:
Description: "Athena query results bucket name"
Value: !Ref QueryResults
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment