Download one by hand. See if you get the Stanford NLTK running to extract places and dates. And see if it works!
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
grams=3 | |
corpus="eng-all" | |
searchstring="attention" | |
simultaneousDownloads="16" | |
# THIS PROCESS IS EXTREMELY RESOURCE INTENSIVE--DO NOT RUN IT ON A LARK OR IF YOU DON'T UNDERSTAND WHAT IT DOES, | |
# BECAUSE WASTING ENERGY AND BANDWIDTH IS BAD FOR THE ENVIRONMENT. | |
# To ensure you don't run it trivially, I've included a obvious command in the code that quickly stops the download. | |
# If you can't find or fix this, you probably shouldn't be running the script! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/sh | |
#Adapted by Ben Schmidt from Barry Hubbard's code at | |
#http://www.barryhubbard.com/articles/37-general/74-converting-a-pdf-to-text-in-linux | |
#to convert into a folder of text files, each one representing a page. | |
#This takes pdfs from the pdfs folder, writes tif files to the images folder, and writes text to the texts folder. | |
#each pdf gets a _folder_ in each of the other two. | |
mkdir -p texts | |
mkdir -p images |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"translatorID":"ba905f1a-436b-4b6d-a816-ba0b4ac4c9ad", | |
"translatorType":2, | |
"label":"BibLaTeX-Chicago", | |
"creator":"Simon Kornblith, Richard Karnesky, Anders Johansson and Ben Schmidt", | |
"target":"bib", | |
"minVersion":"2.1.9", | |
"maxVersion":"null", | |
"priority":100, | |
"inRepository":false, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ | |
{"field":"title","datatype":"etc","type":"text","unique":true}, | |
{"field":"lc0","datatype":"categorical","type":"character","unique":true}, | |
{"field":"lc1","datatype":"categorical","type":"character","unique":true}, | |
{"field":"lc2","datatype":"etc","type":"integer","unique":true}, | |
{"field":"publishers","datatype":"etc","type":"character","unique":false}, | |
{"field":"subjects","datatype":"categorical","type":"character","unique":false}, | |
{"field":"publish_country","datatype":"categorical","type":"character","unique":true}, | |
{"field":"publish_places","datatype":"categorical","type":"character","unique":false}, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"publishers": ["Simon & Schuster Books for Young Readers"], "searchstring": "[No author], <em>Ernestine & Amanda, summer camp ready or not!</em> (undated) <a href=\"http://openlibrary.org/books/OL1002147M\">more info</a> <a href=\"http://archive.org/stream/ernestineamandas00belt\">read</a>", "lc2": "7", "lc0": "P", "title": "Ernestine & Amanda, summer camp ready or not!", "lccn": ["96041443"], "lc1": "PZ", "editionid": "/books/OL1002147M", "publish": 1997, "filename": "er/ne/ernestineamandas00belt", "languages": ["/languages/eng"], "lc_classifications": ["PZ7.B4197 Er 1997"], "publish_date": "1997", "publish_country": "nyu", "key": "/books/OL1002147M", "authors": ["/authors/OL24054A"], "ocaid": "ernestineamandas00belt", "oclc_numbers": ["35360730"], "works": ["/works/OL16070305W"], "publish_places": ["New York"]} | |
{"publishers": ["Copernicus"], "searchstring": "[No author], <em>The call of distant mammoths</em> (undated) <a href=\"http://openlibrary.org/books/OL1008703M\">more info</a> <a href=\"http://archiv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"publisher": "T.B. Kalbfus", "paperid": "sn82016373", "searchstring": "<img src=\"http://chroniclingamerica.loc.gov/lccn/sn82016373/1891-10-04/ed-1/seq-7/thumbnail.jpg\"> <i>The Sunday herald and weekly national intelligencer</i> (Washington [D.C.]), Friday, October 04, 1891, p. 7. <a href=\"http://chroniclingamerica.loc.gov/lccn/sn82016373/1891-10-04/ed-1/seq-7\">Read page</a>", "lat": 38.8951118, "city": "Washington", "period": "1887/1896", "filename": "sn82016373/1891-10-04_7", "edition": "1", "state": "DC", "paper": "The Sunday herald and weekly national intelligencer", "location": "Washington [D.C.]", "lng": -77.0363658, "date": "1891-10-04", "subjects": [], "successors": [], "precedors": ["sn85042682"], "page": "7"} | |
{"publisher": "T.B. Kalbfus", "paperid": "sn82016373", "searchstring": "<img src=\"http://chroniclingamerica.loc.gov/lccn/sn82016373/1891-10-04/ed-1/seq-5/thumbnail.jpg\"> <i>The Sunday herald and weekly national intelligencer</i> (Washington [D.C.]), Friday, October 04, 1891, p. 5. <a href |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rm(list=ls()) | |
source("SQLFunctions.R") | |
plotSet = function(filename,dck=701,alpha=.01,lwd=8) { | |
library(grid) | |
paths = tbl(tblsrc,"paths") %.% | |
filter(DCK==dck) %.% | |
arrange(voyagenum,yearday) %.% | |
select(LON,LAT,voyagenum) %.% | |
collect() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
This uses a fake sprintf style construction to handle easily resetting searchstrings without rebuilding the whole database. | |
Anything in a string like this: | |
%(blah blah)s | |
will be broken out in mysql as a literal. | |
By default, this assumes you only need the "catalog" field: things will get strange if you try to use any non-unique fields. |
MySQL has a canonical implementation:
SELECT FROM_DAYS(730669);
+-------------------+
| FROM_DAYS(730669) |
+-------------------+
| 2000-07-03 |
OlderNewer