Last active
October 8, 2015 00:56
-
-
Save mhoye/98e6234a53dd14b3d57c to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
# This is a four-part process, and it's awful, and I'm sorry. I'd have | |
# automated more of it, but the interstitials we've put on the etherpads | |
# have foiled my efforts there and I wanted to get this out fast. | |
# | |
# This will give you: | |
# - A folder full of all your team pads in their current state | |
# in text form, and | |
# - A single zipped file of containing all of them. | |
# | |
# This _will not give you_: | |
# - Password-locked etherpads. | |
# | |
# You need a few things for this to work: | |
# - Firefox, | |
# - A Mozilla VPN connection, | |
# - wget and zip (available from Unix package managers everywhere) | |
# - a degree of comfort with a terminal. | |
# | |
# If you don't know with certainty that you have all those things | |
# email me at [email protected] and I will do my best to help you. | |
# If you need the recorded history of an etherpad, team or not, I have | |
# a way to export that per-pad that this margin is too narrow to contain. | |
# | |
# The process, for which I again apologize, is: | |
# - mkdir yourself a new dir somewhere, save this shell script into it, | |
# and make it executable.a | |
# - Connect to Mozilla's VPN. | |
# - Log into your team etherpad site, click threough the interstitials | |
# and click on the "all pads" tab. | |
# - This is the gross manual part. Go to File -> Save Page As and | |
# select Format: Web Page, HTML Only. Name the file "all-pads", | |
# all lowercase, no file extension. | |
# - Finally, in your terminal window, type in: | |
# | |
# ./etherscrape.sh [team name] | |
# | |
# where [team name] is whatever your etherpad URLs start with. For | |
# example, if that URL starts with "https://firefox-ux.old-etherpad..." | |
# then you'd type in "./etherscrape.sh firefox-ux" | |
# - Hit enter, and let it run. This process can take a few minutes. | |
WORKDIR=./saved_etherpads | |
if [ $# -eq 0 ]; | |
then | |
echo | |
echo "Usage: ./etherscrape.sh [team name]" | |
echo | |
exit -1 | |
fi | |
echo "Team Name is: " $1 | |
if [ ! -d $WORKDIR ]; then | |
mkdir $WORKDIR | |
fi | |
echo "Saving files to " $WORKDIR | |
for i in `cat all-pads | grep padmeta | sed "s/.*href=\"\///g" | sed "s/\".*//g"` ; \ | |
do wget https://$1.old-etherpad.webapp.phx1.mozilla.com/ep/pad/export/$i/latest?format=txt \ | |
-O ./$WORKDIR/$i.txt > /dev/null 2>&1 && echo ".\c" ; | |
done | |
echo "Done." | |
echo "Compressing files..." | |
zip -r $1.zip $WORKDIR > /dev/null 2>&1 | |
echo "Done. Compressed file is " $1.zip |
Tested on OSX 10.10.5, for what it's worth. There's a report of some pathological shell behavior on 10.8.something.
As far as I understand, this will get all hrefs of an etherpad, even though it wouldn't be a etherpad, is that what we want?
My team etherpads were all private and I couldn't spot a way to make them all public without a labourious clicking on each pad (then the alert) then the public button. Instead I dumped my cookies from Firefox and passed them through to that script using --load-cookies.
If you've got the Cookie Manager+ addon, you can dump the single cookie you need into a file and use that.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm accepting improvements to this, obviously.