Skip to content

Instantly share code, notes, and snippets.

#!/usr/bin/env python
import sys
import ujson
# Script/Library created by Mike McCabe to do multiple
# metadata fetches in parallel
# Available here: https://gist.github.com/3784845
from parallel_md_get import metadata_record_iterator
#!/usr/bin/env python
import os
from lxml import etree
import archive
XML_ROOT = '/2/data/jstor/bundle/articles/'
PDF_ROOT = '/2/data/jstor/ejc/jstor-early-journal-content/'
#!/usr/bin/env python
import os
from lxml import etree
import archive
XML_ROOT = '/2/data/jstor/bundle/articles/'
PDF_ROOT = '/2/data/jstor/ejc/jstor-early-journal-content/'
#!/usr/bin/env python
import sys
import simplejson
import requests
import tablib
#_______________________________________________________________________________
def get_ia_meta(id):
#!/usr/bin/env python
import sys
import simplejson
import requests
import tablib
#_______________________________________________________________________________
def get_ia_meta(id):
#!/usr/bin/env python
import sys
import requests
import tablib
import jellyfish
#_______________________________________________________________________________
def get_json(url, params={}):
#!/usr/bin/env python
import sys
import requests
import tablib
import jellyfish
#_______________________________________________________________________________
def get_json(url, params={}):

GENERAL TODO:

  • The examples are all over the place. They need to be more consistent.
  • Check that x-archive-queue-derive header. I just skimmed it and it doesn't seem right.
  • Investigate getting an "[email protected]" address for support requests
  • Some of the standard metadata fields are repeatable, some are not. State this in the descriptions.
  • Excellent Hank idea: Quick Start (TL;DR) section to avoid all the gory details
  • Dang, but this damn thing is hard to read. Will that get better when it gets converted to the PHP wrapper? I have my doubts. May need a some quick George love to give tips for better readability.
  • All the other 'foo' (read: green) bits below

GENERAL TODO:

  • The examples are all over the place. They need to be more consistent.
  • Check that x-archive-queue-derive header. I just skimmed it and it doesn't seem right.
  • Investigate getting an [email protected] address for support requests
  • Some of the standard metadata fields are repeatable, some are not. State this in the descriptions.
  • Excellent Hank idea: Quick Start (TL;DR) section to avoid all the gory details
  • Dang, but this damn thing is hard to read. Will that get better when it gets converted to the PHP wrapper? I have my doubts. May need a some quick George love to give tips for better readability.
  • All the other 'foo' (read: green) bits below
#!/usr/bin/env python
"""Modify an Internet Archive items metadata using the Metadata Write API.
Requires the Python libraries:
Requests: https://github.com/kennethreitz/requests.git
python-json-patch: https://github.com/stefankoegl/python-json-patch.git
Note:
The IA Metadata API does not yet comply with the latest Json-Patch
standard. It currently complies with version 02: