Guide by riley (efnet, @bad.pet)
To use other projects, see Running ArchiveTeam Warrior on GCP Based on this
If necessary. https://console.cloud.google.com/projectcreate
Then select the project you want to use
https://console.cloud.google.com/compute/instanceTemplates/add
name: archiveteam-tumblr-grab
type: micro
disk: 10gb debian 9
Expand "Management, security, disks, networking, sole tenancy" section
Security -> SSH Keys -> paste in your key
Networking -> Network service tier, if you like
Management -> Startup script (edit the variables!):
#!/bin/bash
DOWNLOADER="awoobis-unconfigured" # your handle for the scoreboard
CONCURRENCY="1" # simultaneous items to run PER INSTANCE
PIPELINE_ARGS="" # If needed, eg. "--context-value bind_address=123.4.5.6"
SWAPSIZE="512M"
# A bit of swap for comfort
fallocate -l "$SWAPSIZE" /swap
chmod 600 /swap
mkswap /swap
echo '/swap swap swap defaults 0 0' >>/etc/fstab
swapon /swap
# Set up tumblr-grab
apt-get update
apt-get upgrade -y
apt-get install -y tmux python-pip git liblua5.1-0
pip install --upgrade seesaw
adduser --system --group --shell /bin/bash archiveteam
sudo -u archiveteam bash -c "cd /home/archiveteam;git clone https://github.com/ArchiveTeam/tumblr-grab.git"
# To build your own wget-lua
#apt-get install -y git-core autoconf libgnutls28-dev liblua5.1-0-dev flex
#sudo -u archiveteam bash -c "cd ~/tumblr-grab;./get-wget-lua.sh"
# Install GCP monitoring agent
curl -sS https://dl.google.com/cloudagents/install-monitoring-agent.sh | bash &
# tumblr-monitor
wget https://gist.github.com/JustAnotherArchivist/f4617c902626377532692a341794f273/raw/4a81f66b5dcbc18deb0d530979a443be12b1844a/tumblr-monitor -O /home/archiveteam/tumblr-monitor
chmod +x /home/archiveteam/tumblr-monitor
sudo -i -u archiveteam tmux new-session -d -s tumblr-grab \
"cd /home/archiveteam/tumblr-grab/;run-pipeline pipeline.py --concurrent $CONCURRENCY $PIPELINE_ARGS $DOWNLOADER"
echo "@reboot tmux new-session -d -s tumblr-grab \
'cd /home/archiveteam/tumblr-grab/;run-pipeline pipeline.py --concurrent $CONCURRENCY $PIPELINE_ARGS $DOWNLOADER'" \
| crontab -u archiveteam -
https://console.cloud.google.com/compute/instancesAdd?templateName=archiveteam-tumblr-grab
It'll take several minutes for setup to finish, but eventually you should be able to run /home/archiveteam/tumblr-monitor
to check the status.
For a nice shutdown (may take hours/days), either attach and Ctrl+C once or run this on each VM, then wait for the program to exit:
install -o archiveteam /dev/null /home/archiveteam/tumblr-grab/STOP
To kill it, either attach and Ctrl+C twice or just kill the VMs from https://console.cloud.google.com/compute/instances