regex cheat sheet: http://regexlib.com/CheatSheet.aspx?AspxAutoDetectCookieSupport=1
An example regex file, the consort one: https://github.com/ContentMine/ami-plugin/blob/master/regex/consort0.xml
An etherpad for a regex file: http://pads.cottagelabs.com/p/ediregex
Links
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0027019
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0070645
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0064480
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0003535
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0025669
- http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0088590
We then ran this through CANARY. http://canary.contentmine.org/camarades
Go onto the VMs. Do these commands:
cd
mkdir edi
ls
cd edi
vi urls.txt
quickscrape --scraperdir ~/workshop/journal-scrapers/scrapers --urllist urls.txt
norma -q ~/edi/ -i fulltext.xml -o scholarly.html --xsl nlm2html
vi regex.xml
ami2-regex -q ~/edi/ -i scholarly.html --r.regex ~/edi/regex.xml