Skip to content

Instantly share code, notes, and snippets.

@nottux
Created May 13, 2020 18:16
Show Gist options
  • Save nottux/f04262503b0cdff15b744ef06061bf57 to your computer and use it in GitHub Desktop.
Save nottux/f04262503b0cdff15b744ef06061bf57 to your computer and use it in GitHub Desktop.
Go wild...
a=$(ls -lq *jpg *gif *png|wc -l);while [ $a -lt 392 ];do for i in {0..392};do (b=$(curl https://www.vgcats.com/comics/?strip_id=$i|grep -roEe 'src=\"images/.*jpg' -Ee 'src=\"images/.*gif' -Ee 'src=\"images/.*png' -|cut -c 13-);if ls $b;then echo existing;else wget --retry-on-http-error --no-clobber https://www.vgcats.com/comics/images/$b;fi)&done;find . -type f -not -name '*jpg' -not -name '*gif' -not -name '*png' -print0|xargs -0 rm --;a=$(ls -lq *jpg *gif *png|wc -l);done
@nottux
Copy link
Author

nottux commented May 13, 2020

The hardest part to come by was cloudflare's DDOS protection.
There are still things that can be improved;

  1. script doesn't remember which sites to check, in each cycle it spawns 393 instances of curl
  2. script is still incapable to tell the latest panel number, today was 392 so it's hard coded like that
  3. script still generates lots of garbage file, log files should be easily get ridden with a wget option, for .1 .2 files it's more complicated and has to do with multithreading. For index files I don't know (maybe they were side product of wget's that couldn't find image url and gone now)
  4. maybe toning down parallel instances of curl can speed things by avoiding unnecessary blocks (should be paired with "1)" though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment