Skip to content

Instantly share code, notes, and snippets.

@bmschmidt
bmschmidt / ExampleCatalog.json
Created November 13, 2013 16:53
First 12 Open Library records
{"publishers": ["Simon & Schuster Books for Young Readers"], "searchstring": "[No author], <em>Ernestine & Amanda, summer camp ready or not!</em> (undated) <a href=\"http://openlibrary.org/books/OL1002147M\">more info</a> <a href=\"http://archive.org/stream/ernestineamandas00belt\">read</a>", "lc2": "7", "lc0": "P", "title": "Ernestine & Amanda, summer camp ready or not!", "lccn": ["96041443"], "lc1": "PZ", "editionid": "/books/OL1002147M", "publish": 1997, "filename": "er/ne/ernestineamandas00belt", "languages": ["/languages/eng"], "lc_classifications": ["PZ7.B4197 Er 1997"], "publish_date": "1997", "publish_country": "nyu", "key": "/books/OL1002147M", "authors": ["/authors/OL24054A"], "ocaid": "ernestineamandas00belt", "oclc_numbers": ["35360730"], "works": ["/works/OL16070305W"], "publish_places": ["New York"]}
{"publishers": ["Copernicus"], "searchstring": "[No author], <em>The call of distant mammoths</em> (undated) <a href=\"http://openlibrary.org/books/OL1008703M\">more info</a> <a href=\"http://archiv
@bmschmidt
bmschmidt / OLfield_descriptions
Created November 13, 2013 16:49
Example field_descriptions.json
[
{"field":"title","datatype":"etc","type":"text","unique":true},
{"field":"lc0","datatype":"categorical","type":"character","unique":true},
{"field":"lc1","datatype":"categorical","type":"character","unique":true},
{"field":"lc2","datatype":"etc","type":"integer","unique":true},
{"field":"publishers","datatype":"etc","type":"character","unique":false},
{"field":"subjects","datatype":"categorical","type":"character","unique":false},
{"field":"publish_country","datatype":"categorical","type":"character","unique":true},
{"field":"publish_places","datatype":"categorical","type":"character","unique":false},

Make sure it works

Download one by hand. See if you get the Stanford NLTK running to extract places and dates. And see if it works!

Downloading files

@bmschmidt
bmschmidt / BibLaTeX-Chicago.js
Last active December 25, 2015 00:59
A BibLaTeX zotero translator specially designed to work with the BibLaTeX-Chicago library, which has a few special rules about how to translate different sorts of documents into appropriate citations. Most of this is simply adapted from the existing BibLaTeX Chicago plugin; but it has a few other nice features the original translator lacks, such…
{
"translatorID":"ba905f1a-436b-4b6d-a816-ba0b4ac4c9ad",
"translatorType":2,
"label":"BibLaTeX-Chicago",
"creator":"Simon Kornblith, Richard Karnesky, Anders Johansson and Ben Schmidt",
"target":"bib",
"minVersion":"2.1.9",
"maxVersion":"null",
"priority":100,
"inRepository":false,
@bmschmidt
bmschmidt / TesseractPDF.sh
Last active December 24, 2015 08:09
Make Tifs from pdfs using gs and then perform OCR on them using tesseract.
#!/usr/bin/sh
#Adapted by Ben Schmidt from Barry Hubbard's code at
#http://www.barryhubbard.com/articles/37-general/74-converting-a-pdf-to-text-in-linux
#to convert into a folder of text files, each one representing a page.
#This takes pdfs from the pdfs folder, writes tif files to the images folder, and writes text to the texts folder.
#each pdf gets a _folder_ in each of the other two.
mkdir -p texts
mkdir -p images
@bmschmidt
bmschmidt / gist:923dce0330d72486ee8d
Last active December 18, 2015 19:29
Starting to make
grams=3
corpus="eng-all"
searchstring="attention"
simultaneousDownloads="16"
# THIS PROCESS IS EXTREMELY RESOURCE INTENSIVE--DO NOT RUN IT ON A LARK OR IF YOU DON'T UNDERSTAND WHAT IT DOES,
# BECAUSE WASTING ENERGY AND BANDWIDTH IS BAD FOR THE ENVIRONMENT.
# To ensure you don't run it trivially, I've included a obvious command in the code that quickly stops the download.
# If you can't find or fix this, you probably shouldn't be running the script!