Skip to content

Instantly share code, notes, and snippets.

@hn-support
Created January 11, 2017 12:47
Show Gist options
  • Save hn-support/60016f4b7986ad2cce693bdf2f3501b4 to your computer and use it in GitHub Desktop.
Save hn-support/60016f4b7986ad2cce693bdf2f3501b4 to your computer and use it in GitHub Desktop.
A cache warmer in bash using curl
#!/bin/bash
if [ "$#" -ne 1 ] || [ "x$1" == "x" ] ; then
echo "Usage: $0 <sitemap.xml>"
exit 0;
fi
if [ ! -f "$1" ]; then
echo "Sitemap file $1 not found! Exit!"
exit 1
fi
cat "$1" | perl -ne 'while (/>(http.+?)</g) { print "$1\n"; }' | while read line; do
echo " Crawling $line "
curl -so /dev/null -w "%{time_connect} - %{time_starttransfer} - %{time_total} " $line
done
@peterjaap
Copy link

Updated version that skips images when they are present in the sitemap (which is allowed);

#!/bin/bash

if [ "$#" -ne 1 ] || [ "x$1" == "x" ] ; then 
    echo "Usage: $0 <sitemap.xml>"
    exit 0;
fi

if [ ! -f "$1" ]; then 
    echo "Sitemap file $1 not found! Exit!"
    exit 1
fi

cat "$1" | perl -ne 'while (/<loc>(http.+?)<\/loc>/g) { print "$1\n"; }' | while read line; do 
    echo "  Crawling $line  "
    curl -so /dev/null -w "%{time_connect} - %{time_starttransfer} - %{time_total}  " $line
done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment