Created
May 11, 2021 08:29
-
-
Save tridungle/d1061f98d41b65d83dec2ec606980c72 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AWSTemplateFormatVersion: "2010-09-09" | |
Transform: "AWS::Serverless-2016-10-31" | |
Description: > | |
CloudFormation in Action: An example with seven AWS services (SNS, SQS, Lambda, Kinesis, S3, Glue and Athena) | |
Use case: How to set-up an environment in AWS to perform data analysis on the daily numbers of Covid-19? | |
Author: Muttalip Kucuk | |
Date: 01-11-2020 | |
Globals: | |
Function: | |
Runtime: java11 | |
Timeout: 30 | |
MemorySize: 256 | |
Resources: | |
UpdateTopic: | |
Type: AWS::SNS::Topic | |
Properties: | |
TopicName: "update-topic" | |
UpdateSubscription: | |
Type: AWS::SNS::Subscription | |
Properties: | |
Endpoint: !GetAtt "UpdateQueue.Arn" | |
Protocol: sqs | |
RawMessageDelivery: true | |
TopicArn: !Ref "UpdateTopic" | |
UpdateQueue: | |
Type: AWS::SQS::Queue | |
Properties: | |
QueueName: "update-queue" | |
RedrivePolicy: | |
deadLetterTargetArn: !GetAtt UpdateDLQ.Arn | |
maxReceiveCount: 5 | |
UpdateDLQ: | |
Type: AWS::SQS::Queue | |
Properties: | |
QueueName: "update-dlq" | |
SnsToSqsPolicy: | |
Type: AWS::SQS::QueuePolicy | |
Properties: | |
Queues: | |
- Ref: UpdateQueue | |
PolicyDocument: | |
Id: QueuePolicy | |
Version: "2012-10-17" | |
Statement: | |
- Action: | |
- "sqs:SendMessage" | |
Effect: Allow | |
Resource: !GetAtt UpdateQueue.Arn | |
Principal: | |
AWS: "*" | |
UpdateProcessor: | |
Type: AWS::Serverless::Function | |
Properties: | |
CodeUri: . | |
Description: Function which first transforms the daily update to the data model specified in the Glue table definition and then send it to Firehose | |
Environment: | |
Variables: | |
UPDATE_DELIVERY_STREAM: !Ref UpdateDeliveryStream | |
Events: | |
SqsEvent: | |
Type: SQS | |
Properties: | |
Queue: !GetAtt UpdateQueue.Arn | |
BatchSize: 10 | |
Enabled: true | |
FunctionName: "update-processor" | |
Handler: nl.kucuktechnology.covid19.UpdateProcessor::handleRequest | |
Policies: | |
- AWSLambdaBasicExecutionRole | |
- Statement: | |
- Action: | |
- "sqs:*" | |
Effect: Allow | |
Resource: | |
- !GetAtt UpdateQueue.Arn | |
- !GetAtt UpdateDLQ.Arn | |
- Action: | |
- "firehose:PutRecordBatch" | |
Effect: Allow | |
Resource: !GetAtt UpdateDeliveryStream.Arn | |
UpdateDeliveryStream: | |
Type: AWS::KinesisFirehose::DeliveryStream | |
Properties: | |
DeliveryStreamName: "update-stream" | |
DeliveryStreamType: DirectPut | |
ExtendedS3DestinationConfiguration: | |
BucketARN: !GetAtt UpdateBucket.Arn | |
BufferingHints: | |
SizeInMBs: 128 | |
IntervalInSeconds: 60 | |
CompressionFormat: UNCOMPRESSED | |
DataFormatConversionConfiguration: | |
SchemaConfiguration: | |
CatalogId: !Ref AWS::AccountId | |
RoleARN: !GetAtt FirehoseToS3Role.Arn | |
DatabaseName: !Ref UpdateDatabase | |
TableName: !Ref UpdateTable | |
Region: !Ref AWS::Region | |
VersionId: LATEST | |
InputFormatConfiguration: | |
Deserializer: | |
OpenXJsonSerDe: { } | |
OutputFormatConfiguration: | |
Serializer: | |
ParquetSerDe: { } | |
Enabled: True | |
Prefix: "!{timestamp:yyyy}/!{timestamp:MM}/!{timestamp:dd}/" | |
RoleARN: !GetAtt FirehoseToS3Role.Arn | |
S3BackupMode: Disabled | |
UpdateBucket: | |
Type: AWS::S3::Bucket | |
DeletionPolicy: Delete | |
Properties: | |
BucketName: "update" | |
PublicAccessBlockConfiguration: | |
BlockPublicAcls: true | |
BlockPublicPolicy: true | |
IgnorePublicAcls: true | |
RestrictPublicBuckets: true | |
FirehoseToS3Role: | |
Type: AWS::IAM::Role | |
Properties: | |
AssumeRolePolicyDocument: | |
Version: 2012-10-17 | |
Statement: | |
- Sid: '' | |
Effect: Allow | |
Principal: | |
Service: firehose.amazonaws.com | |
Action: 'sts:AssumeRole' | |
Policies: | |
- PolicyName: firehose_delivery_policy_new | |
PolicyDocument: | |
Version: 2012-10-17 | |
Statement: | |
- Action: | |
- 's3:AbortMultipartUpload' | |
- 's3:GetBucketLocation' | |
- 's3:GetObject' | |
- 's3:ListBucket' | |
- 's3:ListBucketMultipartUploads' | |
- 's3:PutObject' | |
Effect: Allow | |
Resource: '*' | |
- Action: 'glue:GetTableVersions' | |
Effect: Allow | |
Resource: '*' | |
FirehoseToS3Policy: | |
Type: AWS::IAM::Policy | |
Properties: | |
PolicyName: firehose_delivery_policy | |
PolicyDocument: | |
Version: 2012-10-17 | |
Statement: | |
- Action: | |
- 's3:AbortMultipartUpload' | |
- 's3:GetBucketLocation' | |
- 's3:GetObject' | |
- 's3:ListBucket' | |
- 's3:ListBucketMultipartUploads' | |
- 's3:PutObject' | |
Effect: Allow | |
Resource: | |
- !Join | |
- '' | |
- - 'arn:aws:s3:::' | |
- !Ref UpdateBucket | |
- !Join | |
- '' | |
- - 'arn:aws:s3:::' | |
- !Ref UpdateBucket | |
- '*' | |
Roles: | |
- !Ref FirehoseToS3Role | |
CrawlerInS3Role: | |
Type: AWS::IAM::Role | |
Properties: | |
RoleName: "crawler-role" | |
AssumeRolePolicyDocument: | |
Statement: | |
- Action: sts:AssumeRole | |
Effect: Allow | |
Principal: | |
Service: | |
- glue.amazonaws.com | |
ManagedPolicyArns: | |
- arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole | |
Policies: | |
- PolicyDocument: | |
Statement: | |
- Action: | |
- s3:GetObject* | |
- s3:PutObject* | |
Effect: Allow | |
Resource: !Join | |
- '' | |
- - !GetAtt UpdateBucket.Arn | |
- '/*' | |
PolicyName: "crawler-policy" | |
GlueCrawler: | |
Type: AWS::Glue::Crawler | |
Properties: | |
Configuration: "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"}}}" | |
DatabaseName: !Ref UpdateDatabase | |
Name: "event-crawler" | |
Role: !GetAtt CrawlerInS3Role.Arn | |
Schedule: | |
ScheduleExpression: cron(0 * * * ? *) | |
SchemaChangePolicy: | |
UpdateBehavior: "LOG" | |
DeleteBehavior: "LOG" | |
Targets: | |
CatalogTargets: | |
- DatabaseName: !Ref UpdateDatabase | |
Tables: | |
- !Ref UpdateTable | |
UpdateDatabase: | |
Type: AWS::Glue::Database | |
Properties: | |
CatalogId: !Ref AWS::AccountId | |
DatabaseInput: | |
Name: "update-db" | |
UpdateTable: | |
Type: AWS::Glue::Table | |
Properties: | |
CatalogId: !Ref AWS::AccountId | |
DatabaseName: !Ref UpdateDatabase | |
TableInput: | |
Description: Glue table containing all the updates which used the S3 bucket as data source | |
Owner: owner | |
Retention: 0 | |
Name: "updates" | |
StorageDescriptor: | |
Location: !Sub 's3://${UpdateBucket}/' | |
Columns: | |
- Name: date | |
Type: date | |
- Name: country | |
Type: string | |
- Name: cases | |
Type: int | |
- Name: deaths | |
Type: int | |
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | |
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat | |
Compressed: false | |
NumberOfBuckets: -1 | |
SerdeInfo: | |
SerializationLibrary: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | |
Parameters: | |
serialization.format: '1' | |
BucketColumns: [ ] | |
SortColumns: [ ] | |
StoredAsSubDirectories: false | |
PartitionKeys: | |
- Name: year | |
Type: string | |
- Name: month | |
Type: string | |
- Name: day | |
Type: string | |
TableType: EXTERNAL_TABLE | |
QueryResults: | |
Type: AWS::S3::Bucket | |
DeletionPolicy: Delete | |
Properties: | |
BucketName: "query-results" | |
PublicAccessBlockConfiguration: | |
BlockPublicAcls: true | |
BlockPublicPolicy: true | |
IgnorePublicAcls: true | |
RestrictPublicBuckets: true | |
Outputs: | |
UpdateTopicArn: | |
Description: "Update topic ARN" | |
Value: !Ref UpdateTopic | |
UpdateBucketName: | |
Description: "Update bucket name" | |
Value: !Ref UpdateBucket | |
GlueCrawler: | |
Description: "Glue crawler name" | |
Value: !Ref GlueCrawler | |
UpdateDatabaseName: | |
Description: "Update database name" | |
Value: !Ref UpdateDatabase | |
UpdateTableName: | |
Description: "Update table name" | |
Value: !Ref UpdateTable | |
QueryResultsName: | |
Description: "Athena query results bucket name" | |
Value: !Ref QueryResults |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment