Skip to content

Instantly share code, notes, and snippets.

@terribleplan
Last active December 16, 2018 21:59
Show Gist options
  • Save terribleplan/11f880b20c7830b1a4b031807103aa25 to your computer and use it in GitHub Desktop.
Save terribleplan/11f880b20c7830b1a4b031807103aa25 to your computer and use it in GitHub Desktop.
Archive team tumblr docker

Context

Tumblr is taking down tons of content, especially anything they consider NSFW. Archive Team is working to preserve what is to be deleted. You can help by running this.

Get Servers

If you only have one IP, then you should really only run one instance of the archiver. If you want to run more, then you will need to get some servers from some provider.

  • digitalocean (my referral link) has $5/mo servers (tested and working on ubuntu 18.04)
  • OVH has cheap unmetered VPS (tested and working on ubuntu 18.04), get them in BHS to avoid any sort of GDPR weirdness

Run it

  1. Have docker (https://docs.docker.com/install/linux/docker-ce/ubuntu/)
  2. Run the container docker run -itde 'NICK=yournickname' terribleplan/tumblr-archive:latest

If you think you want more/less concurrency, then you can throw in an environment variable to override the default of 2 -e 'CONCURRENT=4'

If you want to persist your data outside the docer image (planning on restarts?) then add a volume -v /opt/tumblr-grab-data:/app/tumblr-grab/data

If there are updates

  1. Determine the running container id docker ps
  2. Attach to the container docker attach <id>
  3. (gracefully) kill your container (control-c) once, and wait for it to stop itself
  4. Pull the new version docker pull terribleplan/tumblr-archive:latest
  5. Run it again, as above

Build it

  1. Create Dockerfile and run.sh
  2. Make sure run.sh is executable chmod run.sh +x
  3. Build the container docker build -t tumblr-archive:latest .
  4. Run your built container docker run -itde 'NICK=yournickname' tumblr-archive:latest

Watch it

  1. Get id of running container docker ps
  2. Tail the logs docker logs -f <id>
FROM ubuntu:16.04
WORKDIR /app
RUN \
apt update && \
apt upgrade -y && \
apt install -y curl python python-dev python-pip git-core libgnutls30 libgnutls-dev lua5.1 liblua5.1-0 liblua5.1-0-dev bzip2 zlib1g zlib1g-dev flex autoconf && \
pip install --upgrade seesaw && \
git clone https://github.com/ArchiveTeam/tumblr-grab.git && \
cd tumblr-grab && \
./get-wget-lua.sh && \
cd .. && \
apt remove -y curl python-pip python-dev git-core libgnutls-dev liblua5.1-0-dev zlib1g-dev flex autoconf && \
apt autoremove -y && \
apt-get clean
ADD run.sh /app
CMD ./run.sh
#!/bin/bash
if [[ -z "${NICK}" ]]; then
NICK="dr_$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 6 | head -n 1)"
fi
if [[ -z "${CONCURRENT}" ]]; then
CONCURRENT="2"
fi
cd /app/tumblr-grab/
/usr/local/bin/run-pipeline /app/tumblr-grab/pipeline.py --concurrent "${CONCURRENT}" --address '127.0.0.1' "${NICK}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment