This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I know I'm doing all types of wrong here: | |
Source HTML file here: http://mdpi.com/1420-3049/19/4/5150/htm | |
I want the text for the dc.source: | |
Molecules 2014, Vol. 19, Pages 5150-5162 | |
Am using beautiful soup, so probably best to do it in that BUT it should also be regex-able. I can do this in bash no problem! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
curl -g --location --header 'Accept: application/x-bibtex' "http://dx.doi.org/10.1651/0278-0372(2005)025[0159:GR]2.0.CO;2" > test.txt | |
RETURNS | |
<h1>Internal Server Error</h1> | |
(I've encountered about 91 DOIs that appear to give this error) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<opml version="1.0"> | |
<head> | |
<title>Ross's academic journal RSS feed subscriptions</title> | |
</head> | |
<body> | |
<outline text="General Biology Journals" title="General Biology Journals"> | |
<outline type="rss" text="BioEssays" title="BioEssays" xmlUrl="http://onlinelibrary.wiley.com/rss/journal/10.1002/(ISSN)1521-1878" htmlUrl="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1002%2F%28ISSN%291521-1878"/> | |
<outline type="rss" text="Biol J Linn Soc" title="Biol J Linn Soc" xmlUrl="http://onlinelibrary.wiley.com/rss/journal/10.1111/(ISSN)1095-8312" htmlUrl="http://onlinelibrary.wiley.com/resolve/doi?DOI=10.1111%2F%28ISSN%291095-8312"/> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(phangorn) | |
#264 REFERENCE trees in phylip format, PAUP numbering hence 2 | |
ref2 <- read.tree("jackr2.tre") | |
#264 trees in phylip format to pair-wise compare to the reference trees, TNT numbering hence 1 | |
tr2 <- read.tree("jack1.tre") | |
x <- {} | |
#all reference trees to one comp tree | |
for (i in 1:length(tr2)) { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
egrep "(^Citations$|Cited Literature$|Literature [cC]ited$|Literatures cited$|Literature Cited\:$|References$|^references$|Refrences$|References [cC]ited$|REFERENCES$|Bibliography$|BIBLIOGRAPHY$|LITERATURE CITED$|LITERATURE cited$|REFERENCES CITED$|References \[not in Zootaxa format\]$|^Reference$|^Literature$|^References \(asterisks|^References \(except original descriptions|Litterature cited$|Literture Cited$)" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Journal Name ISSN | |
Abstract and Applied Analysis 10853375 | |
Acta Crystallographica Section E 16005368 | |
Acta Electrotechnica et Informatica 13358243 | |
Acta Linguistica Asiatica 22323317 | |
Acta Medica Martiniana 13358421 | |
Acta Societatis Botanicorum Poloniae 16977 | |
Acta Universitaria 1886266 | |
Acta Universitatis Palackianae Olomucensis : Gymnica 12121185 | |
Acta Veterinaria Scandinavica 17510147 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#N.B. On *ubuntu RCurl may not install for you off the bat. If so read: http://www.omegahat.org/RCurl/FAQ.html & sudo apt-get install libcurl4-openssl-dev | |
install.packages(c("RCurl","twitteR","wordcloud","tm","stringr")) | |
library(twitteR); library(wordcloud); library(tm); library(stringr); | |
# Search for #mooc tweets | |
mooctweets <- searchTwitter("#mooc", n=2000) | |
length(mooctweets) # ends up with 713 as of 03-Jan-13 at 15:42 London time | |
# make into a data.frame | |
mooctweets_df <- twListToDF(mooctweets) |
NewerOlder