I ran into a some problems trying to deploy a Vert.x based HTTP service, using Hazelcast as the cluster manager, on AWS ECS Fargate (a container orchestration service where you deploy docker containers without worrying about the underlying virtual machines - kind of like Kubernetes, but simpler perhaps?).
Hazelcast has good documentation on their website on using Hazelcast on AWS ECS,
but all of it assumes that you are configuring the cluster with their YAML configuration language - which unfortunately doesn't
appear to be supported by Vert.x - setting the system property vertx.hazelcast.config
will only take an XML file - Vert.x will
try to parse it to configure Hazelcast and will crap on a YAML configuration file. Also, they assume you'll host the configuration
file on an EFS volume - which is another hassle to set up.
Other public documentation where XML configuration is used, do not use the Hazelcast AWS plugin but some other discovery mechanism which is either abandondend or rely on some external non-trivial infrastructure.
My requirements are to keep it as simple as possible:
- One docker container with a straight-forward Vert.x application
- I'm using a shaded jar with Vert.x included, but probably any other Vert.x deployment should work.
- Use only supported software for the Hazelcast cluster.
- Vert.x (as of current 4.2.1) uses Hazelcast 4.2.2, and I'm also adding the latest
hazelcast-aws
library (3.4 as of this writing). - Hazelcast 5 will have the AWS support built-in, but I'm not sure when we're going to get that for Vert.x.
- Vert.x (as of current 4.2.1) uses Hazelcast 4.2.2, and I'm also adding the latest
- CloudFormation for setting up and managing the cluster
- All configuration has to be in environment variables - no additional files to upload and manage in EFS or S3. CloudFormation can't do that.
- Side note: Terraform allows you to orchestrate files in storage, but I'm not using it for reasons that are out of scope from this discussion.
And that's it - if I need more setup than a single CloudFormation template, I'll look somewhere else.
As I've mentioned above, my application is built into a single JAR - built and packaged using Maven - that can be deployed with any supporting JVM without needing to install the "Vert.X distribution", but if you do use a distribution the configuration will be somewhat different (and you might want to base it on the official Vert.X image though as that is currently based on Java 8 - maybe don't that that either).
The application is built and packaged using Maven, and I'm using maven-shade-plugin
to pack all dependencies (including Vert.x,
Hazelcast and the Hazelcast plugins) into a single JAR and also to create a MANIFEST.MF
file that automatically runs my verticle
when the JAR is "run", using this maven-shade-plugin
transformer configuration:
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<manifestEntries>
<Main-Class>io.vertx.core.Launcher</Main-Class>
<Main-Verticle>my.main.verticle</Main-Verticle>
<Version>${project.version}</Version>
</manifestEntries>
</transformer>
An issue with shading multiple Hazelcast discovery plugins is that the plugins expose themselves using a service loader configuration
file in the JAR META-INF
directory. When shading multiple plugins, each will offer itself in the identically named
META-INF/services/com.hazelcast.spi.discovery.DiscoveryStrategyFactory
configuration file, and shading all of them together will
cause the only the first such file to be created in the JAR - and this is often the "Multicast" discovery strategy, as it is built-in
to the main Hazelcast JAR.
When this issue is triggered, you'd get errors in the application log in the form:
There is no discovery strategy factory to create 'DiscoveryStrategyConfig{properties={}, className='com.hazelcast.aws.AwsDiscoveryStrategy', discoveryStrategyFactory=null}'
Hopefully with Hazelcast 5 - where all the plugins are in the same JAR - it won't be an issue but until
then you need to take care of service files when shading, using the AppendingTransformer
:
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/com.hazelcast.spi.discovery.DiscoveryStrategyFactory</resource>
</transformer>
This will bunch up all the discovery factory configurations into a single file so that Hazelcast can find all of them.
Because we are already deep in AWS, I'm just using the current LTS Corretto JVM as the base image and package my JAR file:
Dockerfile
:
FROM amazoncorretto:17
RUN yum install -y unzip && rm -rf /var/cache/yum # or other linux deps you want
WORKDIR /srv/app
ADD src/main/resources/docker-entrypoint.sh /docker-entrypoint.sh
ENTRYPOINT [ "/docker-entrypoint.sh" ]
ARG JAR_NAME
ADD target/$JAR_NAME.jar /srv/app/app.jar
I'm using Maven to build the docker image as well, in which case the plugin configuration looks like this (though you may have other fine ideas):
<plugin>
<groupId>com.spotify</groupId>
<artifactId>dockerfile-maven-plugin</artifactId>
<version>1.4.2</version>
<executions>
<execution>
<id>default</id>
<goals>
<goal>build</goal>
<goal>push</goal>
</goals>
</execution>
</executions>
<configuration>
<repository>your.docker.registry.possibly.aws.ecr/your-tag</repository>
<tag>${project.version}</tag>
<buildArgs>
<!-- we use the output from maven-shade-plugin, which should be configured appropriately -->
<JAR_NAME>${project.artifactId}-${project.version}-shaded</JAR_NAME>
</buildArgs>
</configuration>
</plugin>
The docker-entrypoint.sh
script is where the interesting stuff happens:
#!/bin/bash -x
if [ -n "$HZ_CLUSTER_XML" ]; then
base64 -d <<<$HZ_CLUSTER_XML > /cluster.xml
APP_ARGS=-cluster
JVM_OPTIONS="$JVM_OPTIONS \
-Dvertx.hazelcast.config=/cluster.xml \
-Dhazelcast.http.healthcheck.enabled=true \
"
# Hazelcast really wants these things for Java >= 9
JVM_OPTIONS="$JVM_OPTIONS \
--add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED \
--add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED \
--add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED \
"
fi
[ -z "$JAVA_HOME" ] && JAVA_HOME=/usr
[ -x "$JAVA_HOME/bin/java" ] || { echo "No Java runtime found!"; exit 5; }
$JAVA_HOME/bin/java $JVM_OPTIONS -jar /srv/app/app.jar $APP_ARGS "$@"
The main thing that happens here is that we detect a Hazelcast XML configuration content in the HZ_CLUSTER_XML
environment variable
and if so - set up Hazelcast clustering for Vert.x (if you omit that environment variable, you get non-clustered local SharedData
instances that work fine but don't actually share data - its good for testing).
Now you can create your Fargate cluster using CloudFormation - configuration is code!
I do use YAML for my configuration file as it is saner to read and write than JSON (and XML, but sometimes we can't help it). The configuration example below assumes more basic entities (such as VPC and load balancer) have already been configured - I use this snippet as a nested stack in a larger CloudFormation setup that has a lot of other things and a few more stanard HTTP services that hook into the load balancer. But it should be easy to extrapolate or even use as is.
AWSTemplateFormatVersion: '2010-09-09'
Description: My Fargate cluster
Parameters:
VpcId:
Type: AWS::EC2::VPC::Id
Description: Select a VPC that allows instances access to the Internet.
VPCipv4Prefix:
Description: The VPC network IPv4 prefix
Type: String
VPCipv6Prefix:
Description: The VPC network IPv6 prefix
Type: String
RouteTable:
Description: The VPC main routing table where subnet can attach for network access
Type: String
DesiredCapacity:
Type: Number
Default: 3
Description: Number of instances to launch in your ECS cluster.
MaxCapacity:
Type: Number
Default: 10
Description: Maximum number of instances that can be launched in your ECS cluster.
LoadBalancerListener:
Description: The public load balancer listener
Type: String
LoadBalancerSecurityGroup:
Description: The load balancer security group reference
Type: String
EcrRepository:
Description: the ECR repository for the service
Type: String
ContainerTag:
Description: The image to deploy for the service (default: latest)
Type: String
Default: latest
Resources:
MyCluster:
Type: AWS::ECS::Cluster
Properties:
ClusterName: my-cluster
MyFargateExecutionRole:
Type: AWS::IAM::Role
Properties:
RoleName: my-ecs-role
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: ecs-tasks.amazonaws.com
Action: 'sts:AssumeRole'
ManagedPolicyArns:
- 'arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy'
MyLogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: fargate/my-service # could be anything, I just like prefixes that look like directories
MyTaskPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
- logs:DescribeLogStreams
Resource:
- arn:aws:logs:*:*:*
# for hazelcast
- Effect: Allow
Action:
- ec2:DescribeNetworkInterfaces
- ecs:ListTasks
- ecs:DescribeTasks
Resource:
- "*"
MyTaskRole:
Type: AWS::IAM::Role
Properties:
RoleName: my-task-ecs-role
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: ecs-tasks.amazonaws.com
Action: 'sts:AssumeRole'
ManagedPolicyArns:
- Ref: MyTaskPolicy
MySecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: My Task Security Group
VpcId: !Ref VpcId
SecurityGroupIngress:
- Description: Allow access from load balancer
IpProtocol: tcp
FromPort: 8080
ToPort: 8080
SourceSecurityGroupId: !Ref LoadBalancerSecurityGroup
MySecurityGroupSelfIngress: # allow access to myself, so cluster nodes can communicate
Type: AWS::EC2::SecurityGroupIngress
Properties:
GroupId: !Ref MySecurityGroup
IpProtocol: -1
SourceSecurityGroupId: !Ref MySecurityGroup
MySubnetA:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone: !Select [ 0, !GetAZs { Ref: "AWS::Region" } ]
CidrBlock: !Sub ${VPCipv4Prefix}.10.0/24
Ipv6CidrBlock: !Sub "${VPCipv6Prefix}10::/64"
VpcId: !Ref VpcId
MySubnetB:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone: !Select [ 1, !GetAZs {Ref: "AWS::Region"} ]
CidrBlock: !Sub ${VPCipv4Prefix}.11.0/24
Ipv6CidrBlock: !Sub "${VPCipv6Prefix}11::/64"
VpcId: !Ref VpcId
MySubnetC:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone: !Select [ 2, !GetAZs {Ref: "AWS::Region"} ]
CidrBlock: !Sub ${VPCipv4Prefix}.12.0/24
Ipv6CidrBlock: !Sub "${VPCipv6Prefix}12::/64"
VpcId: !Ref VpcId
MySubnetRoutingA:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref RouteTable
SubnetId: !Ref MySubnetA
MySubnetRoutingB:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref RouteTable
SubnetId: !Ref MySubnetB
MySubnetRoutingC:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref RouteTable
SubnetId: !Ref MySubnetC
MyTargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
HealthCheckIntervalSeconds: 10
HealthCheckPath: /
HealthCheckTimeoutSeconds: 5
UnhealthyThresholdCount: 2
HealthyThresholdCount: 2
Name: my-vertx-target-group
Port: 8080
Protocol: HTTP
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: 60 # default is 300
TargetType: ip
VpcId: !Ref VpcId
MyALBListenerRule:
Type: AWS::ElasticLoadBalancingV2::ListenerRule
Properties:
Actions:
- Type: forward
TargetGroupArn: { Ref: MyTargetGroup }
Conditions:
- Field: path-pattern
Values:
- /my-api/v1
- /my-api/v1/*
ListenerArn: !Ref LoadBalancerListener
Priority: 1
MyTaskDefinition:
Type: AWS::ECS::TaskDefinition
DependsOn:
- MyLogGroup
Properties:
Family: my-task
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
Cpu: 512
Memory: 1GB
ExecutionRoleArn: !Ref MyFargateExecutionRole
TaskRoleArn: !Ref MyTaskRole
ContainerDefinitions:
- Name: my-task-container
Image: !Sub ${EcrRepository}:${ContainerTag}
Environment:
# I find this one useful
- Name: JVM_OPTIONS
Value: >
-XX:+CrashOnOutOfMemoryError
# configure our cluster!
- Name: HZ_CLUSTER_XML
Value:
Fn::Base64: !Sub |
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.7.xsd" xmlns="http://www.hazelcast.com/schema/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<properties>
<property name="hazelcast.discovery.enabled">true</property>
</properties>
<cluster-name>my-cluster</cluster-name>
<network>
<join>
<multicast enabled="false"/>
<aws enabled="true"/>
</join>
<interfaces enabled="true">
<interface>${VPCipv4Prefix}.*.*</interface>
</interfaces>
</network>
<multimap name="__vertx.subs">
<backup-count>1</backup-count>
<value-collection-type>SET</value-collection-type>
</multimap>
<map name="__vertx.haInfo">
<backup-count>1</backup-count>
</map>
<map name="__vertx.nodeInfo">
<backup-count>1</backup-count>
</map>
<cp-subsystem>
<cp-member-count>0</cp-member-count>
<semaphores>
<semaphore>
<name>__vertx.*</name>
<jdk-compatible>false</jdk-compatible>
<initial-permits>1</initial-permits>
</semaphore>
</semaphores>
</cp-subsystem>
</hazelcast>
PortMappings:
- ContainerPort: 8080
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-region: !Ref AWS::Region
awslogs-group: !Ref MyLogGroup
awslogs-stream-prefix: ecs
MyService:
Type: AWS::ECS::Service
Properties:
ServiceName: my-service
Cluster: !Ref MyCluster
TaskDefinition: !Ref MyTaskDefinition
DeploymentConfiguration:
MinimumHealthyPercent: 100
MaximumPercent: 200
DesiredCount: !Ref DesiredCapacity
HealthCheckGracePeriodSeconds: 30
LaunchType: FARGATE
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: ENABLED
Subnets:
- !Ref MySubnetA
- !Ref MySubnetB
- !Ref MySubnetC
SecurityGroups:
- !Ref MySecurityGroup
LoadBalancers:
- ContainerPort: 8080
ContainerName: my-task-container
TargetGroupArn: !Ref MyTargetGroup
MyAutoScalingTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
Properties:
MinCapacity: !Ref DesiredCapacity
MaxCapacity: !Ref MaxCapacity
ResourceId: !Sub service/${MyCluster}/${MyService.Name}
ScalableDimension: ecs:service:DesiredCount
ServiceNamespace: ecs
RoleARN: arn:aws:iam::756645658314:role/aws-service-role/ecs.amazonaws.com/AWSServiceRoleForECS
ConferenceAutoScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: my-autoscaling-policy
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref MyAutoScalingTarget
TargetTrackingScalingPolicyConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: ECSServiceAverageCPUUtilization
ScaleInCooldown: 20
ScaleOutCooldown: 20
TargetValue: 75
The "magic" here is in embedding the Hazelcast XML configuration into the task definition using a Base64 encoded environment variable.
All the rest is just bog-standard CloudFormation for a Fargate cluster - which also took me a while to set up properly, and you are welcome to that as well.