jjjake / metadata_write_api.py

Created July 31, 2012 20:00

An example script used for editing metadata on archive.org that utilizes the Archive's new Metadata Write API, and the Python HTTP library, Requests.

	#!/usr/bin/env python
	# -- coding: utf-8 --
	import os
	import json

	import requests

	COOKIES = {'logged-in-sig': os.environ['LOGGED_IN_SIG'],
	'logged-in-user': os.environ['LOGGED_IN_USER'],
	}

jjjake / ia_catalog.py

Created August 13, 2012 21:04

Given an IA catalog.php URL, returns a list of dictionaries (1 dictionary/row).

	#!/usr/bin/env python
	import os, sys
	import requests
	import json

	COOKIES = {'logged-in-sig': os.environ['LOGGED_IN_SIG'],
	'logged-in-user': os.environ['LOGGED_IN_USER'],
	'verbose': '1',
	}

jjjake / get-symlinks.sh

Created September 14, 2012 23:33

Download all Symlink Instruction files from a given list of identifiers.

	#!/bin/bash
	# Download all Symlink Instruction files from a given list of identifiers.
	# Usage: ./get-symlinks.sh itemlist.txt

	while read identifier
	do
	mkdir -p symlink_instructions
	URL="http://archive.org/download/${identifier}/${identifier}_symlinks.txt"
	COOK="Cookie: logged-in-sig=$LOGGED_IN_SIG; logged-in-user=$LOGGED_IN_USER"
	wget -q "$URL" -O symlink_instructions/${identifier}_symlinks.txt \

jjjake / get.py

Created September 19, 2012 23:00

openreadingroom.com scripts; it get's pretty ugly!

	#!/usr/bin/env python
	import os
	#import lxml.html, lxml.etree
	import lxml.etree
	import subprocess
	import json
	import urllib

	ROOT_DIR = os.getcwd()
	utf8_parser = lxml.etree.HTMLParser(encoding='utf-8')

jjjake / check_for_acoustid.py

Created September 22, 2012 00:37 — forked from mikemccabe/gist:3712588

check if an item has an acoustid or music brainz release: ./check_for_acoustid.py {item}

	#!/usr/bin/env python

	""" Check if an item on archive.org has an acoustid.

	Usage:
	./check_for_acoustid.py {item}

	Usage with GNU Parallel:
	cat itemlist.txt \| parallel --max-procs=8 --group './check_for_acoustid.py {}'

jjjake / tree.md

Created September 24, 2012 15:43 — forked from hrldcpr/tree.md

one-line tree in python

One-line Tree in Python

Using Python's built-in defaultdict we can easily define a tree data structure:

def tree(): return defaultdict(tree)

That's it!

jjjake / parallel_md_get.py

Created September 25, 2012 22:31 — forked from mikemccabe/parallel_md_get.py

Parallel archive.org metadata fetching using python and gevent

	# This demonstrates doing multiple metadata fetches in parallel.
	# It seems to be fast enough that the json decoding cost becomes
	# a significant proportion of the execution time.

	# It requires gevent; see http://www.gevent.org/intro.html#installation

	# To make this do something useful, modify do_work().

	import gevent

jjjake / shuf.sh

Created October 3, 2012 22:32

	taskid="$1"
	log=$(curl -s "http://www-tracey.us.archive.org/log_show.php?task_id=$taskid&full=1")
	filepath=$(echo $log \| grep -Po '(?<=\[dir\]\ \=\>\ )/14/items/archiveteam-mobileme-hero-9' \| head -1)
	node=$(echo $log \| grep -Po '(?<=\[server).*?(?=.us.archive.org)' \| head -1 \| grep -Po 'ia6[0-9]{5}')

	itemsize_url="http://$node.us.archive.org/item-size.php?path=$filepath"
	# Item size in KB
	_itemsize=$(curl -s "$itemsize_url" \| grep -Po '(?<=\<size\>).*(?=\</size\>)')
	# Convert to GB
	itemsize=$(echo "$_itemsize/1000000" \| bc)

jjjake / aid_but_not_mbid.py

Created October 19, 2012 03:42

	#!/usr/bin/env python
	#
	# Provided a list of identifiers for items on archive.org, return all items
	# that have an "acoustid" for every original audio file, but NOT a
	# "mb_recording_id".
	#
	import sys
	import logging
	from datetime import datetime

jjjake / get_meta_keys.py

Created October 24, 2012 00:16

	#!/usr/bin/env python
	#
	# Find out the most used metadata fields on archive.org
	#
	import sys
	import logging
	from datetime import datetime

	import ujson as json
	import cPickle as pickle