10/20/2022
Consider this script:
# file name: docker_publish_os.sh
#!/bin/bash
OS=$1
if [ -z $DOCKER_USER ] || [ -z $DOCKER_PASSWORD ]; then
exit 0;
fi
docker login --username $DOCKER_USER --password $DOCKER_PASSWORD
if [[ $OS == "alpine" ]]; then
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
elif [[ $OS == "ubuntu" ]]; then
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
elif [[ $OS == "all" ]]; then
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
docker tag $IMAGE $IMAGE_TAG
docker push $IMAGE_TAG
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
docker tag $IMAGE $IMAGE_TAG
docker push $IMAGE_TAG
exit 0
else
echo "Error"
exit 1
fi
docker tag $IMAGE $IMAGE_TAG
docker push $IMAGE_TAG
This seems readable enough especially if you have ample experience with Bash (and Docker). However these kinds of scripts overlook some issues:
- The script name almost describes what the script does, but not really.
- You have to read the code or external documentation to understand what the script does.
- The fact that programmers typically specialize in different tech stacks.
Let's tackle the issues from top to bottom and see what we can do to address them.
I believe that programmers should have an idea of what a script should do, and the name of the script should match what it does. No other questions need to be asked as we see the title other than maybe granular details.
In our example, the file name is docker_publish_os.sh
. There are inherently 3 questions
that come to mind as we read this title.
- What is docker?
- What does it mean to publish something?
- What does OS (operating system) have to do with publishing?
The first one is easily addressable: just Google what docker is. But then if we Google further we might see that Docker is just a flavor of a containerization tech stack. It might be a bit overkill to consider that we may swap Docker with other Open Container tools. No apparent improvement jumps to my mind after answering this question.
After learning about Docker through Googling and reading the script, I see that the script pushes the image to a repository determined by the operating system type of the container and under some specified account.
Alas, we found a potential improvement to the naming. We might morph the name as follows:
docker_publish_os.sh
- originaldocker_image_push_os.sh
- let's define what publishing meansimage_push_os.sh
- let's consider that we might swap Docker with something elseimage_push.sh
- hmm, can we make the name sound like a verb?push_image.sh
- I like it!
New name is push_image.sh
.
Let's say we hire a programmer. The new hire:
- specialize in Python and Java
- needs to use the script as part of a new CI/CD pipeline
When the script is ran it will first not work. The new hire does not know:
DOCKER_USER
andDOCKER_PASSWORD
have to be defined as environment variables.- The script takes a command line argument that specifies the OS to tag the image to publish with.
New hire opens push_image.sh
, and the first thing that pops up are Bash specific syntax. A slew of questions come:
- What is $1?
- What is -z?
- Where are DOCKER_USER and DOCKER_PASSWORD defined?
Google, google, google. Takes about 5 mins? How about a junior engineer? 30 minutes? 1 hour?
For this we have an easy fix. A help script flag that shows us how to use the script.
# file name: docker_publish_os.sh
#!/bin/bash
# DOCKER_USER and DOCKER_PASSWORD as environment variable
OS=$1
if [[ $1 == "-h" ]]; then
print_usage
exit 0
fi
if [ -z $DOCKER_USER ] || [ -z $DOCKER_PASSWORD ]; then
exit 0;
fi
docker login --username $DOCKER_USER --password $DOCKER_PASSWORD
if [[ $OS == "alpine" ]]; then
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
elif [[ $OS == "ubuntu" ]]; then
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
elif [[ $OS == "all" ]]; then
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
docker tag $IMAGE $IMAGE_TAG
docker push $IMAGE_TAG
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
docker tag $IMAGE $IMAGE_TAG
docker push $IMAGE_TAG
exit 0
else
echo "Error"
exit 1
fi
docker tag $IMAGE $IMAGE_TAG
docker push $IMAGE_TAG
print_usage() {
echo -e """Usage:
\tpush_image.sh <operating-system>
Supported operating-system values:
alpine - tags the local alpine-build-machine:latest
ubuntu - tags the local ubuntu-build-machine:latest with \$DOCKER_USER environment variable
all - tags all listed supported images with \$DOCKER_USER environment variable
The Docker commands in this script requires these environment variables:
DOCKER_USER=<your Docker account's username>
DOCKER_PASSWORD=<your Docker account's password>
"""
}
Noted, let's add usage strings triggered by a help flag. Now we don't have to worry about another programmer getting bogged down by trying to understand what the script does.
Let's say this same new hire would have really wanted to understand the code. Perhaps take ownership of it as part of maintaining the code base for the team.
Since the new hire in all intent and purposes will have a hard time coding in Bash, I as a programmer writing this code should ensure that readers of my code should not experience this issue. How do we do this? Well let's say the code should read like a general language (English in my locale).
So let's say that the script really does two things:
- Authenticate to remote Docker repository
- Push the image to the repository
The first lines I should see in the code should look something like:
if [[ $1 == "-h" ]]; then
print_usage
exit 0
fi
OS=$1
docker_authenticate
docker_push $OS
By extracting the main idea of the script. We can see that reading the code bottoms out faster. We can quickly infer that the remaining lines in this script are just the details of what is in the main idea. Furthermore we can reduce the size of the functions we write with this same concept of extracting main ideas from code. Doing this iteratively, we come up with small functions that are easier to name, more expressive, and allows our readers to infer what the functions do instead of reading more code.
This is an idea directly from the Clean Code book by Robert Martin which says that functions should be small and do exactly what it sounds like it does. So let's break down the last part of the script
if [[ $OS == "alpine" ]]; then
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
elif [[ $OS == "ubuntu" ]]; then
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
elif [[ $OS == "all" ]]; then
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
docker tag $IMAGE $IMAGE_TAG
docker push $IMAGE_TAG
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
docker tag $IMAGE $IMAGE_TAG
docker push $IMAGE_TAG
exit 0
else
echo "Error"
exit 1
fi
docker tag $IMAGE $IMAGE_TAG
docker push $IMAGE_TAG
Hmmm..., $IMAGE repeats too many times, all this really does is push images tagged by the OS. How about I define push_image() that takes an argument for the OS. The result is:
push_image() {
OS=$1
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
docker tag $IMAGE $IMAGE_TAG
docker push $IMAGE_TAG
}
# logic block:
if [[ $OS == "alpine" ]]; then
push_image $OS
elif [[ $OS == "ubuntu" ]]; then
push_image $OS
elif [[ $OS == "all" ]]; then
push_image "ubuntu"
push_image "alpine
else
echo "Error"
exit 1
fi
Sweet! We reduced the logic block from 22 to 11 lines!
Now this is where I am going to express my own taste as a programmer. There's another idea in the Clean Code book that functions without parameters should be preferred to functions with parameters. Some of you might disagree, but I would sacrifice more lines of code to maintain push_image() and get rid of yet another cognitive load that the function parameter introduces.
docker_push() {
OS=$1
if [[ $OS == "alpine" ]]; then
push_image_alpine
elif [[ $OS == "ubuntu" ]]; then
push_image_ubuntu
elif [[ $OS == "all" ]]; then
push_all_supported_images
else
os_not_supported_error
fi
}
push_image_alpine() {
push_image "alpine"
}
push_image_ubuntu() {
push_image "ubuntu"
}
push_all_supported_images() {
push_image_alpine
push_image_ubuntu
}
push_image() {
OS=$1
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
docker tag $IMAGE $IMAGE_TAG
docker push $IMAGE_TAG
}
os_not_supported_error() {
print_not_supported
exit 1;
}
print_not_supported() {
echo """
The operating system is not supported, please run with -h flag to view all supported OS.
"""
}
Notice that more granular definitions appear further down the script. It's a nice analogy with
"digging" down the code. We can reorder the calls of Bash functions so we can take advantage of
this feature to make sure that whoever reads the script sees the main idea of the script and just
dig down the code. Finally bottoming out to the usage string. Let's see my final version of
push_image.sh
# file name: push_image.sh
#!/bin/bash
main() {
if [[ $1 == "-h" ]]; then
print_usage
exit 0
fi
OS=$1
docker_authenticate
docker_push $OS
}
docker_authenticate() {
check_has_auth_env_vars
login
}
check_has_auth_env_vars() {
if [ -z $DOCKER_USER ] || [ -z $DOCKER_PASSWORD ]; then
print_usage
exit 0;
fi
}
login() {
docker login --username $DOCKER_USER --password $DOCKER_PASSWORD
}
docker_push() {
OS=$1
if [[ $OS == "alpine" ]]; then
push_image_alpine
elif [[ $OS == "ubuntu" ]]; then
push_image_ubuntu
elif [[ $OS == "all" ]]; then
push_all_supported_images
else
os_not_supported_error
fi
}
push_image_alpine() {
push_image "alpine"
}
push_image_ubuntu() {
push_image "ubuntu"
}
push_all_supported_images() {
push_image_alpine
push_image_ubuntu
}
push_image() {
OS=$1
IMAGE="${OS}-build-machine:latest"
IMAGE_TAG="${DOCKER_USER}/${IMAGE}"
docker tag $IMAGE $IMAGE_TAG
docker push $IMAGE_TAG
}
os_not_supported_error() {
print_not_supported
exit 1;
}
print_not_supported() {
echo """
The operating system is not supported, please run with -h flag to view all supported OS.
"""
}
print_usage() {
echo -e """Usage:
\tpush_image.sh <operating-system>
Supported operating-system values:
alpine - tags the local alpine-build-machine:latest
ubuntu - tags the local ubuntu-build-machine:latest with \$DOCKER_USER environment variable
all - tags all listed supported images with \$DOCKER_USER environment variable
The Docker commands in this script requires these environment variables:
DOCKER_USER=<your Docker account's username>
DOCKER_PASSWORD=<your Docker account's password>
"""
}
main $@
After all the improvements, we can see that:
- Anyone reading this code needs very little knowledge of Bash since most of the logic blocks are contained in functions whose names are straight English.
- There is very little cognitive load for programmers to understand what each function does since the concepts are so small per code block.
- The script further documents itself without the need to mention how it works in a README file.
- The usage of
main()
should not introduce further complications as it is a very common function in most programming languages as the first lines of code to be executed in the program.
Let's not forget, scripts are also code. Good code needs to be readable and self-documenting. We should also make it easy for other programmers to read an extend our code. This increases productivity in the future and also makes others lives easier.
I hope you enjoyed this article and found it insightful! Thanks! - AN
Love this @arthurnap24!
I really appreciate the way you broke down your thinking -- and all of these steps were exactly what I was hoping for / looking for as a BASHer 💯
Couple of nitpicks: https://gist.github.com/bobbravo2/867562b7a4be3b1767565f89dac5b4fe
And I'm with you re: whether or not the breaking out of all those named functions improves or obscures readability. The proof point is always, IME, onboarding a new team member -- and seeing their beginners mind for if they can extend/enhance this great work!