Skip to content

Instantly share code, notes, and snippets.

@chris-kobrzak
Last active August 7, 2018 09:56
Show Gist options
  • Save chris-kobrzak/9bf0bbbe745798ef4e1055ad29f04eb0 to your computer and use it in GitHub Desktop.
Save chris-kobrzak/9bf0bbbe745798ef4e1055ad29f04eb0 to your computer and use it in GitHub Desktop.
Gentle URL checker
#!/usr/bin/env bash
# - consumes a list of URL paths from a file called `paths.txt` (a simple
# new-line separated list),
# - concatenates them with base URLs (that can be defined with environment
# variables),
# - hits these URLs every <interval> seconds in a seqential way
# - waits a little longer every <batch> and <bigBatch> URLs to ease out
# the pressure on your server
defaultUrl=${DEFAULT_URL:-https://<your domain here>}
fallbackUrl=${FALLBACK_URL:-https://<your other domain here>}
interval=0.1
batchInterval=1
bigBatchInterval=3
batch=10
bigBatch=100
urlIndex=1
function requestUrl () {
curl -I -f -s --write-out '%{http_code}\t%{url_effective}\n' -o /dev/zero $1
}
while read path; do
requestUrl "$defaultUrl$path"
if [ $? -ne 0 ]; then
requestUrl "$fallbackUrl$path"
if [ $? -ne 0 ]; then
echo Broken URL "$path" on line $urlIndex
fi
fi
batchModulo=$(($urlIndex % $batch))
bigBatchModulo=$(($urlIndex % $bigBatch))
if [ $batchModulo -eq 0 ]; then
sleep $batchInterval
elif [ $bigBatchModulo -eq 0 ]; then
sleep $bigBatchInterval
fi
((urlIndex++))
sleep $interval
done < paths.txt
exit 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment