Skip to content

Instantly share code, notes, and snippets.

@astrofrog
Created November 22, 2010 13:50
Show Gist options
  • Select an option

  • Save astrofrog/710001 to your computer and use it in GitHub Desktop.

Select an option

Save astrofrog/710001 to your computer and use it in GitHub Desktop.
Recursively find all PDF files in a folder that contain DOIs and display them
import os
import glob
import sys
def extract_doi(filename):
f = open(filename,'rb')
text = f.read()
f.close()
start = text.find('URI(')
if start > 0:
end = text.index(')', start)
print "%-20s %s" % (os.path.basename(filename), text[start+4:end-1])
def search_doi(directory):
for item in glob.glob(os.path.join(directory,'*')):
if os.path.isdir(item):
search_doi(item)
elif item.endswith('.pdf'):
extract_doi(item)
if __name__ == '__main__':
for arg in sys.argv[1:]:
search_doi(arg)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment