Forcing an Azure Batch node to pull from ACR without restarting the node

Forcing an Azure Batch node to pull from ACR

Sometimes you want to keep a node in an Azure Batch pool up and running as you're debugging tasks so that you don't have to pay the waiting time to spin up and down a node. You may want to push a new image to ACR with the same tag to run the job based on fixes based on the last run. However, Azure Batch nodes don't pull an updated image by default, and will run with the same image if a task is run again even if ACR has been updated.

The best solution I've found to speed things up is:

In Batch Explorer, go to the pool view, right click on the node, and "connect"
This will ssh into the node. Inside of the shell, run:

sudo docker image ls -q | xargs -I{} sudo docker tag {} old-{} && \
  sudo docker rmi $(sudo docker images | awk '$1 ~ /${DOCKER_IMAGE_STR}/ { print $1,$2 }' | sed -En "s/(.*) (.*)/\1:\2/p")

Where ${DOCKER_IMAGE_STR} is a string that's in the relevant docker image names, e.g. "pc" for the planetary computer images.

This will re-tag images with an old- prefix and remove the original image tags. Because the image has a different tag now, the batch node will re-pull the images. Because the images still exist on the machine, the pull will be fast because it's based on the existing retagged images.

lossyrob/azure-batch-force-pull.md

Forcing an Azure Batch node to pull from ACR

zstatmanweil commented Jul 21, 2021