AWS Batch is a fully managed batch job scheduler service on AWS. It could easily manage large scale job queueing and execution. This tutorial shows a way to use Inferentia within a job on AWS Batch.
- create a launch template for base EC2 instance on AWS Batch Compute Environment
- create AWS Batch Compute Environment and Job Queue
- build a contaier image with TensorFlow-Neuron
- push the docker image to Elastic Container Registry
- Submit a inference job to AWS Batch
- create a
userdata.txt
including below
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF
[neuron]
name=Neuron YUM Repository
baseurl=https://yum.repos.neuron.amazonaws.com
enabled=1
metadata_expire=0
EOF
rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
yum install -y aws-neuron-runtime-base aws-neuron-runtime aws-neuron-tools python3 gcc-c++ unzip
--==MYBOUNDARY==--
- create a launch template with UserData and AMI ID of ECS-optimized AMI
You need to replace AMI_ID=ami-0aee8ced190c05726
to AMI ID of ECS-optimized AMI on your region.
You can find the ID on https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html .
$ AMI_ID=ami-0aee8ced190c05726
$ aws ec2 create-launch-template --launch-template-name neuron-sdk-template --launch-template-data '{"ImageId": "'${AMI_ID}'", "UserData": "'$(cat userdata.txt | base64 --wrap=0)'"}'
- create a Compute Environment and Job Queue of AWS Batch.
During the creation of a Compute Environment, all parameter could be default, except launch template and instance type. The launch template should be set as the created template in the previous step and select inf1.xlarge for instance type.
You also need to create a Job Queue associated with the Comute Environment
- Create a
Dockerfile
, including below.
# Example neuron-container dockerfile for AWS Batch
# To build:
# docker build -t neuron-container .
# Prepare application:
# before create the docker image, you need to prepare some files based on the tutorial
# https://github.com/aws/aws-neuron-sdk/blob/master/docs/tensorflow-neuron/tutorial-compile-infer.md
# resnet50_neuron.zip
# infer_resnet50.py
# Note: the container must start with CAP_SYS_ADMIN + CAP_IPC_LOCK capabilities in order
# to map the memory needed from the Infernetia devices. These capabilities will
# be dropped following initialization.
# i.e. To start the container with required capabilities:
# docker run --env AWS_NEURON_VISIBLE_DEVICES="0" -v /run:/run -it neuron-container
FROM amazonlinux:2
COPY resnet50_neuron.zip /tmp/
COPY infer_resnet50.py /tmp/
RUN echo $'[neuron] \n\
name=Neuron YUM Repository \n\
baseurl=https://yum.repos.neuron.amazonaws.com \n\
enabled=1' > /etc/yum.repos.d/neuron.repo
RUN rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
RUN yum install -y \
aws-neuron-runtime-base \
aws-neuron-runtime \
aws-neuron-tools \
python3 \
gcc-c++ \
unzip tar gzip
RUN python3 -m venv neuron_venv && \
source neuron_venv/bin/activate && \
pip install -U pip && \
echo $'[global] \n\
extra-index-url = https://pip.repos.neuron.amazonaws.com' > $VIRTUAL_ENV/pip.conf && \
pip install pillow && \
pip install neuron-cc && \
pip install tensorflow-neuron
RUN echo $'\
#!/bin/bash -xe\n\
source neuron_venv/bin/activate \n\
cd /tmp \n\
curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg \n\
unzip resnet50_neuron.zip \n\
python infer_resnet50.py'\
> /tmp/job.sh &&\
chmod +x /tmp/job.sh
ENV PATH="/opt/aws/neuron/bin:${PATH}"
CMD /tmp/job.sh
- complete TensorFlow-Neuron ResNet-50 tutorial and make
resnet50_neuron.zip
andinfer_resnet50.py
To use a Inferentia, this tutorial depends on the Tutorial Getting Started with TensorFlow-Neuron (ResNet-50 Tutorial) .
Before building the docker image, complete the TensorFlow-Neuron tutorial and put resnet50_neuron.zip
and infer_resnet50.py
on same directory as Dockerfile
.
- build docker image
$ docker build -t neuron-container .
- create a repository on Elastic Container Registry.
$ aws ecr create-repository --repository-name neuron-container
- push docker image to the repository
You should overwrite repository_uri=xxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/neuron-container
to your repository uri.
The uri could be found in the output of create-repository
command above.
$ repository_uri=xxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/neuron-container
$ docker tag neuron-container ${repository_uri}
$ aws ecr get-login-password | docker login --username AWS --password-stdin ${repository_uri}
$ docker push ${repository_uri}
- create a job definition
This command use ${repository_uri}
set in the above.
$ aws batch register-job-definition --job-definition-name neuron-job-def --type container --container-properties '{"image": "'${repository_uri}'", "vcpus": 4, "memory": 4096, "volumes": [{"host": {"sourcePath": "/run"}, "name": "run"}], "mountPoints": [{"containerPath": "/run","sourceVolume": "run"}]}'
- submit a job and check result
Replace the JOB_QUEUE_NAME
to your job queue name for inf1 compute environment.
$ aws batch submit-job --job-name neuron-job --job-queue JOB_QUEUE_NAME --job-definition neuron-job-def
- check job result
after the execution, you could find the inference result on the CloudWatch Logs of the job.
[('n02123045', 'tabby', 0.69945353), ('n02127052', 'lynx', 0.1215847), ('n02123159', 'tiger_cat', 0.08367486), ('n02124075', 'Egyptian_cat', 0.064890705), ('n02128757', 'snow_leopard', 0.009392076)]