Sometimes you want to keep a node in an Azure Batch pool up and running as you're debugging tasks so that you don't have to pay the waiting time to spin up and down a node. You may want to push a new image to ACR with the same tag to run the job based on fixes based on the last run. However, Azure Batch nodes don't pull an updated image by default, and will run with the same image if a task is run again even if ACR has been updated.
The best solution I've found to speed things up is:
- In Batch Explorer, go to the pool view, right click on the node, and "connect"
- This will ssh into the node. Inside of the shell, run:
sudo docker image ls -q | xargs -I{} sudo docker tag {} old-{} && \
sudo docker rmi $(sudo docker images | awk '$1 ~ /${DOCKER_IMAGE_STR}/ { print $1,$2 }' | sed -En "s/(.*) (.*)/\1:\2/p")
Where ${DOCKER_IMAGE_STR} is a string that's in the relevant docker image names, e.g. "pc" for the planetary computer images.
This will re-tag images with an old-
prefix and remove the original image tags. Because the image has a different tag now, the batch node will re-pull the images.
Because the images still exist on the machine, the pull will be fast because it's based on the existing retagged images.
Thanks, Rob! I missed your comment in the chat until the very end but I definitely am going to give this a shot as I do often just sit there twiddling my thumbs while the the node shuts down and restarts.