Skip to content

Instantly share code, notes, and snippets.

@guillaumesmo
Last active June 20, 2021 14:14
Show Gist options
  • Save guillaumesmo/4782e26500a3ac768888daab3c55b139 to your computer and use it in GitHub Desktop.
Save guillaumesmo/4782e26500a3ac768888daab3c55b139 to your computer and use it in GitHub Desktop.
CloudFormation Custom Task Definition POC
# Sources:
# https://cloudonaut.io/how-to-create-a-customized-cloudwatch-dashboard-with-cloudformation/
# https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-custom-resources.html
# https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/ECS.html
Resources:
CustomTaskDefinition:
Type: 'Custom::TaskDefinition'
Version: '1.0'
Properties:
ServiceToken: !GetAtt 'CustomResourceFunction.Arn'
TaskDefinition: |
{
containerDefinitions: [
{
name: "sleep",
image: "busybox",
command: [
"sleep",
"360"
],
mountPoints: [
{sourceVolume: "efs", containerPath: "/efs"}
]
}
],
family: "sleep360",
taskRoleArn: "", // required for EFS permissions
cpu: "256",
memory: "512",
networkMode: "awsvpc",
volumes: [
{
name: "efs",
efsVolumeConfiguration: {
fileSystemId: "" // required for EFS
}
}
]
}
CustomResourceFunction:
Type: 'AWS::Lambda::Function'
Properties:
Code:
ZipFile: |
const aws = require('aws-sdk')
const response = require('cfn-response')
const ecs = new aws.ECS({apiVersion: '2014-11-13'})
exports.handler = function(event, context) {
console.log(`AWS SDK Version: ${aws.VERSION}`)
console.log("REQUEST RECEIVED:\n" + JSON.stringify(event))
if (event.RequestType === 'Create' || event.RequestType === 'Update') {
ecs.registerTaskDefinition(eval(`(${event.ResourceProperties.TaskDefinition})`))
.promise()
.then(data => {
console.log(`Created/Updated task definition ${data.taskDefinition.taskDefinitionArn}`)
response.send(event, context, response.SUCCESS, {}, data.taskDefinition.taskDefinitionArn)
})
.catch(err => {
console.error(err);
response.send(event, context, response.FAILED)
})
} else if (event.RequestType === 'Delete') {
ecs.deregisterTaskDefinition({taskDefinition: event.PhysicalResourceId})
.promise()
.then(data => {
console.log(`Removed task definition ${event.PhysicalResourceId}`)
response.send(event, context, response.SUCCESS)
})
.catch(err => {
if (err.code === 'InvalidParameterException') {
console.log(`Task definition: ${event.PhysicalResourceId} does not exist. Skipping deletion.`)
response.send(event, context, response.SUCCESS)
} else {
console.error(err)
response.send(event, context, response.FAILED)
}
})
} else {
console.error(`Unsupported request type: ${event.RequestType}`)
response.send(event, context, response.FAILED)
}
}
Handler: 'index.handler'
MemorySize: 128
Role: !GetAtt 'CustomResourceRole.Arn'
Runtime: 'nodejs12.x'
Timeout: 30
CustomResourceRole:
Type: 'AWS::IAM::Role'
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: 'lambda.amazonaws.com'
Action: 'sts:AssumeRole'
Policies:
- PolicyName: 'customresource'
PolicyDocument:
Statement:
- Effect: Allow
Action:
- 'ecs:DeregisterTaskDefinition'
- 'ecs:RegisterTaskDefinition'
Resource: '*'
- Effect: Allow
Action:
- 'logs:CreateLogGroup'
- 'logs:CreateLogStream'
- 'logs:PutLogEvents'
Resource: '*'
- Effect: Allow
Action:
- 'iam:PassRole'
Resource: '*' # replace with value of taskRoleArn
@sukrit007
Copy link

Thanks for the same. I enhanced a little bit to ignore deletion of tasks that do not exists and added iam:PassRole permissions:
https://github.com/intraedge-services/aws-ms-wordpress-ha/blob/master/infrastructure/cloudformation/wp-ecs-taskdefinition-function.yaml

@guillaumesmo
Copy link
Author

thank you sukrit007, it makes sense! I updated my POC

@tfarmer00
Copy link

tfarmer00 commented May 14, 2020

@guillaumesmo @sukrit007 I am running into this error with the Lambda. Any ideas? The Cfn deploy then gets stuck without a manual cancel.

2020-05-14T18:16:31.340Z	ERROR	Invoke Error	
{
    "errorType": "TypeError",
    "errorMessage": "Cannot read property 'forEach' of undefined",
    "stack": [
        "TypeError: Cannot read property 'forEach' of undefined",
        "    at Runtime.exports.handler (/var/task/index.js:7:66)",
        "    at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"
    ]
}

@sukrit007
Copy link

@tfarmer00 if you are using my version (https://github.com/intraedge-services/aws-ms-wordpress-ha/blob/master/infrastructure/cloudformation/wp-ecs-taskdefinition-function.yaml#L18), most likely you do not have containerDefinition defined: https://gist.github.com/guillaumesmo/4782e26500a3ac768888daab3c55b139#file-custom-task-definition-yml-L13 . Note it is case sensitive and you should use: "containerDefinitions" instead of "ContainerDefinitions".

@mapoulos
Copy link

mapoulos commented May 20, 2020

This was extremely helpful: thanks!

In case anyone gets the same error I did, I faced this initially.

The specified platform does not satisfy the task definition?s required capabilities. (Service: AmazonECS; Status Code: 400; Error Code: PlatformTaskDefinitionIncompatibilityException; Request ID: 0fe56ac3-f32b-42d0-813e-c71e5ab978cd)

I fixed this by explicitly specifying the platform version in the ECS Service definition:

# ...
ECSService:
    Type: AWS::ECS::Service
    Properties:
      Cluster: !Ref EcsCluster
      PlatformVersion: "1.4.0"
#...

@mokit
Copy link

mokit commented May 29, 2020

Great custom resource!
Anyone else having problems with permissions ("permission denied") when the container wants to access file(s) on efs.

@alazyzombie
Copy link

Great custom resource!
Anyone else having problems with permissions ("permission denied") when the container wants to access file(s) on efs.

yes

@ericklau
Copy link

anyone else having problems when adding encrypted volumes? Lambda is saying that the container definition file has unexpected key resources such as:

UnexpectedParameter: Unexpected key 'transitEncryptionPort' found in params.volumes[0].efsVolumeConfiguration
UnexpectedParameter: Unexpected key 'authorizationConfig' found in params.volumes[0].efsVolumeConfiguration

using https://docs.aws.amazon.com/AmazonECS/latest/userguide/efs-volumes.html for syntax

custom-task-definition.yml (snippet)

      "volumes": [
        {
            "name": "myEfsVolume",
            "efsVolumeConfiguration": {
                "fileSystemId": "fs-1234",
                "rootDirectory": "/path/to/my/data",
                "transitEncryption": "ENABLED",
                "transitEncryptionPort": 1,
                "authorizationConfig": {
                    "accessPointId": "fsap-1234",
                    "iam": "ENABLED"
                }
            }
        }
    ],

@guillaumesmo
Copy link
Author

@ericklau that is an interesting issue:
it's a shame, but the Lambda built-in AWS SDK for JS is not up to date (at time of writing: version 2.631.0). which means your configuration is correct but the SDK in the lambda function is too old to support it.

The solution is to embed a more recent version of the SDK in a layer attached to your lambda function. There are 2 ways:

The hard way, see https://aws.amazon.com/premiumsupport/knowledge-center/lambda-layer-aws-sdk-latest-version/

An alternative is to deploy the following stack in your account:
https://serverlessrepo.aws.amazon.com/applications/arn:aws:serverlessrepo:us-east-1:903779448426:applications~lambda-layer-aws-sdk-js
This will create a lambda layer with the latest AWS SDK for JS which you add to your lambda function properties as follows:
(you can also use a Fn:ImportValue if you prefer since the other stack exports the layer ARN)

      Layers:
        - arn:aws:lambda:eu-west-1:000000000000:layer:my-layer-name:1

I have just tested and I got SDK 2.694.0 which accepted the values for transitEncryption etc.

The laziest solution is to wait for AWS to update their Lambda JS platform :)

Hope it helps. I'll update my gist to output the SDK version.

@jaska120
Copy link

Fantastic job!

I got the EFS to mount without a problem but the image can't access the files on EFS. For example while using nginx image everything works fine before I mount EFS but when I deploy my stack with efsVolumeConfiguration set to created EFS the container can't access those files on EFS and I face with error:

directory index of "/usr/share/nginx/html/" is forbidden

Anybody else experiencing same kind of problem or have a solution? I've been trying to solve this for 2 days now...

@mapoulos
Copy link

@jaska120 I imagine you need to change the file/directory permissions so that the nginx user can read them. More info here.

Probably executing something like

chown -R nginx:nginx /usr/share/nginx/ 

in the entrypoint (or in the docker build if you're building a subimage) will do the trick (may also need to do a chmod, see the link above).

@nickaustin13
Copy link

nickaustin13 commented Jun 17, 2020

This deploys fine when using busybox or any other image from dockerhub, but fails with this error when using an image hosted in ECR:
Task definition does not support launch_type FARGATE. (Service: AmazonECS; Status Code: 400; Error Code: InvalidParameterException;)
Any ideas?

Update:
To make it work with private ECR images you need to add these 2 properties to the custom task definition:
executionRoleArn: { "Ref" : "TaskExecutionRoleArnParameter" }
requiresCompatibilities: [
"FARGATE"
]

Your TaskExecutionRoleArnParameter that you pass in as a parameter should have the permissions explained here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html

@jaska120
Copy link

jaska120 commented Jun 18, 2020

@nickaustin13 I have just tested with ECR and it works. You have some unsupported parameter defined in your Task definition for FARGATE launch type. See my working TaskRole below:

    TaskRole:
      Type: AWS::IAM::Role
      Properties:
        AssumeRolePolicyDocument:
          Statement:
            - Effect: Allow
              Principal:
                Service: ecs-tasks.amazonaws.com
              Action: "sts:AssumeRole"

EDIT: I read your comment again and saw it was updated

@jaska120
Copy link

jaska120 commented Jun 18, 2020

@mapoulos Thanks for your suggestion. I am new to docker and still trying to figure this out. Permissions are ok while using local docker but not while deploying to Fargate. I did try to build subimage with simple Dockerfile:

FROM nginx:latest

RUN chown -R nginx:nginx /usr/share/nginx
RUN chmod 777 /usr/share/nginx

I tried that with 3 combinations: only chown, only chmod and both of them. I deployed my image to ECR but every time I start the Task I get permission errors. (And I didn't forget to change the version while trying different builds..). This must be something to do with how EFS mounts with docker and what kind of permissions are granted. I even have arn:aws:iam::aws:policy/AmazonElasticFileSystemClientFullAccess managed policy attached to my TaskRole.

No luck so far.. Funny part is that according to this AWS guide https://docs.aws.amazon.com/AmazonECS/latest/developerguide/tutorial-efs-volumes.html everything works even without any custom entrypoint or subimage but things are not going well while I am trying the same.

My ultimate goal is to use odoo image, but since nginx is more common I am trying with it first (same permission problems with odoo image).

Any idea?

@mapoulos
Copy link

@jaska120
That is frustrating. The only other thing I can think is IAM permissions: are you using IAM auth at all? I imagine not, but that would potentially cause permission errors.

Only other thing I can think to try is creating an EC2 instance and mounting the EFS system there, seeing what the perms are, etc. My cloudformation looks like this, if it helps:

  SearchallFileSystem:
    Type: AWS::EFS::FileSystem
    Properties:
      Encrypted: true
      PerformanceMode: generalPurpose
      ThroughputMode: bursting
  SearchallEFSMountTarget:
    Type: AWS::EFS::MountTarget
    Properties:
      FileSystemId: !Ref SearchallFileSystem
      SecurityGroups:
        - !ImportValue SearchallSecurityGroup
      SubnetId: !ImportValue SearchallPublicSubnet
  CustomTaskDefinition:
    Type: 'Custom::TaskDefinition'
    Version: '1.0'
    Properties: 
      ServiceToken: !GetAtt 'CustomResourceFunction.Arn'
      TaskDefinition: {
        containerDefinitions: [
          {
            name: "sonic",
            image: {"Ref" : "Image"},
            logConfiguration: {
              logDriver: "awslogs",
              options: {
                "awslogs-group": {"Ref" : "SearchallLogGroup" },
                "awslogs-region": {"Fn::Sub" :  "${AWS::Region}"}, 
                "awslogs-stream-prefix": "searchall-ecs"
              },
            },
            portMappings: [{"containerPort" : {"Ref" : "Port"}}],
            mountPoints: [
              {sourceVolume: "sonic-efs", containerPath: "/var/lib/sonic/store/"}
            ]
          }
        ], 
        family: "searchall-ecs",
        cpu: "256",
        memory: "512",
        networkMode: "awsvpc",
        executionRoleArn: {"Fn::GetAtt" : "SearchallExecutionRole.Arn"},
        requiresCompatibilities: ["FARGATE"],
        
        volumes: [
          {
            name: "sonic-efs",
            efsVolumeConfiguration: {
              fileSystemId: {"Ref" : "SearchallFileSystem"} # required for EFS
            }
          }
        ]
      }

@jaska120
Copy link

@mapoulos Thanks for sharing your cf file. I am not using IAM access since lambda JS aws-sdk layer doesn't support it yet co'z of too old version of sdk on the layer. The only difference I can see is that you haven't provided taskRoleArn and your FileSystem is encrypted. Trying those now.

Do you mind to share your SearchallSecurityGroup just in case I have misconfigured my security group?

@JoanBelder
Copy link

@ericklau I ran into the same problem. But I didn't want to have the hassle of extra layers. I figured that the python runtime in aws lambda has a more up-to-date version of the aws sdk, so I simply ported the code to python, which worked for me. (I haven't actually run all possibilities yet though, so it could also be very buggy)

CustomResourceFunction:
    Type: 'AWS::Lambda::Function'
    Properties:
      Code:
        ZipFile: |
          import json
          import logging
          import boto3
          import cfnresponse

          logger = logging.getLogger()
          logger.setLevel(logging.INFO)
          ecs = boto3.client('ecs')


          def handler(event, context):
              logger.info('got event {}'.format(event))
              if event['RequestType'] == 'Create' or event['RequestType'] == 'Update':
                  try:
                      data = ecs.register_task_definition(**json.loads(event['ResourceProperties']['TaskDefinition']))
                      logger.info(f"Created/Updated task definition ${data['taskDefinition']['taskDefinitionArn']}")
                      cfnresponse.send(event, context, cfnresponse.SUCCESS, {}, data['taskDefinition']['taskDefinitionArn'])
                  except BaseException as error:
                      logger.error(error)
                      cfnresponse.send(event, context, cfnresponse.FAILED, {})
              elif event['RequestType'] == 'Delete':
                  try:
                      ecs.deregister_task_definition(taskDefinition=event['PhysicalResourceId'])
                      logger.info(f"Removed task definition ${event['PhysicalResourceId']}")
                      cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
                  except ecs.exceptions.InvalidParameterException:
                      logger.info(f"Task definition: ${event['PhysicalResourceId']} does not exist. Skipping deletion.")
                      cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
                  except BaseException as error:
                      logger.error(error)
                      cfnresponse.send(event, context, cfnresponse.FAILED, {})
              else:
                  logger.error(f"Unsupported request type: ${event['RequestType']}")
                  cfnresponse.send(event, context, cfnresponse.FAILED, {})
      Handler: 'index.handler'
      MemorySize: 128
      Role: !GetAtt 'CustomResourceRole.Arn'
      Runtime: 'python3.7' # python3.8 does not support ZipFile :(
      Timeout: 30

@mapoulos
Copy link

@jaska120

Sure thing. Here are the bits that should be relevant (I'm not being as careful with the Egress as I might be):

  SearchallSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: A security group for the lambdas, the ecs cluster (for sonic), and the private endpoints
      VpcId: !Ref SearchallVPC
      Tags:
        - Key: project
          Value: searchall-prod
        - Key: type
          Value: searchall-network
  SearchallSecurityGroupEgress:
    Type: AWS::EC2::SecurityGroupEgress
    Properties:
      GroupId: !Ref SearchallSecurityGroup
      IpProtocol: tcp
      FromPort: 443
      ToPort: 443
      CidrIp: 0.0.0.0/0
      Description: HTTPS for ECS/ECR
  SearchallSecurityGroupEgressDynamo:
    Type: AWS::EC2::SecurityGroupEgress
    Properties:
      GroupId: !Ref SearchallSecurityGroup
      IpProtocol: tcp
      FromPort: 0
      ToPort: 65535
      DestinationSecurityGroupId: !Ref SearchallSecurityGroup
      Description: Allow lambdas to get to dynamo through the endpoint    
  SearchallSecurityGroupIngress:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      GroupId: !Ref SearchallSecurityGroup
      SourceSecurityGroupId: !Ref SearchallSecurityGroup
      IpProtocol: tcp
      FromPort: 1491
      ToPort: 1491
  SearchallSecurityGroupEFS:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      GroupId: !Ref SearchallSecurityGroup
      SourceSecurityGroupId: !Ref SearchallSecurityGroup
      IpProtocol: tcp
      FromPort: 2049
      ToPort: 2049

@guillaumesmo
Copy link
Author

For those having issues with permissions, doing a chmod from the Dockerfile will not help since those commands are run when building the image, not when running in ECS

the best thing you can do is mount the EFS in a temporary EC2 instance, create the folder and chmod it accordingly from there and your task should run fine afterwards.

@mapoulos
Copy link

@guillaumesmo Can't believe I missed that. You're right of course.

Another option is to run the chmod and chown in the entrypoint of the image, but that would add startup time (and be superfluous after the first time).

@jaska120
Copy link

jaska120 commented Jul 1, 2020

I forgot to reply and thank you guys!

@mapoulos I had almost the same setup but anyway go to do a double check that everything was ok on the cf side.

@guillaumesmo Thank you for your solution - worked like a charm.

So.. those of you having permission problems, keep in mind that doing chmod while creating your container won't work, since the mount folder is available only while running container. That's why you should use temporary Bastion host and mount EFS there when doing your first deployment.

Executing sudo chmod -R 777 /mnt/efs from Bastion worked where /mnt/efs is the folder where EFS was mounted in the first place.

@namedgraph
Copy link

I'm trying to follow this... I'm getting this error when using EFS volume for my container:

Error response from daemon: create ecs-LinkedDataHubStackLDHTaskDefinitionF106B511-162-FusekiAdminDataVolume-e69dae89abd09e9de901:
VolumeDriver.Create: mounting volume failed: b'mount.nfs4: mounting fs-468514f2.efs.us-east-1.amazonaws.com:/var/fuseki/data/admin failed, reason given by server:
No such file or directory'

What could be the issue here?

@jedis00
Copy link

jedis00 commented Jun 20, 2021

I'm trying to follow this... I'm getting this error when using EFS volume for my container:

Error response from daemon: create ecs-LinkedDataHubStackLDHTaskDefinitionF106B511-162-FusekiAdminDataVolume-e69dae89abd09e9de901:
VolumeDriver.Create: mounting volume failed: b'mount.nfs4: mounting fs-468514f2.efs.us-east-1.amazonaws.com:/var/fuseki/data/admin failed, reason given by server:
No such file or directory'

What could be the issue here?

Make sure your /var/fuseki/data/admin exists. Also, I don’t think this is needed anymore as the support was added natively awhile back iirc.

@namedgraph
Copy link

@jedis00 exists where -- in EFS or in the container? If EFS, how do I create it there?
P.S. Yes I'm using native support.

@jedis00
Copy link

jedis00 commented Jun 20, 2021

You are telling it what directory to mount the EFS to inside of the container. Your container pipeline should be running a ‘mkdir -p /var/fuseki/data/admin‘ to create it if it doesn’t already exist.

@namedgraph
Copy link

OK. This is not required with host mounts though -- so the EFS volumes are different in this respect?

@jedis00
Copy link

jedis00 commented Jun 20, 2021

OK. This is not required with host mounts though -- so the EFS volumes are different in this respect?

Yes it is required for mounting an EFS volume to a host. You’re telling it what directory to mount the EFS to on the host. Since the idea of this is to not mount to the host, you’re mounting it directly inside of the container.

@namedgraph
Copy link

Doesn't the fs-468514f2.efs.us-east-1.amazonaws.com:/var/fuseki/data/admin syntax refer to EFS host:path? Meaning the missing directory is within EFS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment