Skip to content

Instantly share code, notes, and snippets.

View blahah's full-sized avatar

Rik Smith-Unna blahah

  • upward spiral ∞⟨X∴↯⟩∞
  • Bristol / Berlin / Nairobi
  • X @blahah404
View GitHub Profile
@blahah
blahah / mdpi_example.json
Created May 28, 2014 09:48
MDPI example scraper defniition
{
"url": "mdpi\\.com",
"elements": {
"dc.source": {
"selector": "//meta[@name='dc.source']",
"attribute": "content"
},
"figure": {
"selector": "//div[contains(@id, 'fig')]/div/img",
"attribute": "src"
@blahah
blahah / mdpi_figures.json
Last active August 29, 2015 14:01
MDPI figure scraper definition for ContentMine quickscrape
{
"url": "mdpi",
"elements": {
"dc.source": {
"selector": "//meta[@name='dc.source']",
"attribute": "content"
},
"figure_img": {
"selector": "//div[contains(@id, 'fig')]/div/img",
"attribute": "src",
@blahah
blahah / all_mdpi_molecules_full_JournalToc.sh
Last active August 29, 2015 14:02
Get the latest MDPI-Molecules journal URLs from a JournalTOC feed
curl -sSL "http://www.journaltocs.ac.uk/api/journals/1420-3049?output=articles&[email protected]" | grep '\<link\>' | grep -oEi 'http[^\<]*' | sort | uniq | sed 's/$/\/htm/g' > mdpi_jtoclist.txt
@blahah
blahah / install_qs.sh
Last active August 29, 2015 14:02
quickscrape install debian/ubuntu
if [[ $(uname -v) =~ Debian ]]; then
echo "Debian detected: installing backports"
echo "deb http://ftp.us.debian.org/debian wheezy-backports main" >> /etc/apt/sources.list
apt-get update
apt-get install -y nodejs
apt-get install -y nodejs-legacy
curl --insecure https://www.npmjs.org/install.sh | bash
fi
if [[ $(uname -v) =~ Ubuntu ]]; then
echo "Ubuntu detected: using Chris Lea's PPA repo"
@blahah
blahah / install_phantomJS.sh
Last active November 17, 2016 18:55
linux install phantomJS
echo "installing phantomJS"
if [ `getconf LONG_BIT` = "64" ]
then
echo "64-bit system detected"
PHANTOMVER=phantomjs-1.9.7-linux-x86_64
else
echo "32-bit system detected"
PHANTOMVER=phantomjs-1.9.7-linux-i686
fi
wget https://bitbucket.org/ariya/phantomjs/downloads/$PHANTOMVER.tar.bz2 && \
@blahah
blahah / gist:5d7502d101b7b1282c41
Last active August 29, 2015 14:02
shorten multiple URLs with the bit.ly API in a unix one-liner
# have your URLs in a file called URLS
parallel -j1 curl -v 'http://api.bit.ly/shorten?version=2.0.1\&longUrl={}\&login=bitlyapidemo\&apiKey=ABC123' < URLS
@blahah
blahah / setup_js_tests.sh
Last active August 29, 2015 14:03
Set up journal_scrapers test generation environment on a DigitalOcean VM
apt-get install -y software-properties-common build-essential python-software-properties libfontconfig1 git
add-apt-repository -y ppa:chris-lea/node.js
apt-get update
apt-get install -y nodejs
npm install --global quickscrape
\curl -sSL https://get.rvm.io | bash -s stable --ruby
gem install trollop
git clone https://github.com/ContentMine/journal-scrapers.git
@blahah
blahah / zoosys_evo_pdfs.txt
Created July 23, 2014 14:11
Zoosystematics and Evolution all PDFs evar!
http://onlinelibrary.wiley.com/doi/10.1002/zoos.201300006/pdf
http://onlinelibrary.wiley.com/doi/10.1002/zoos.201300007/pdf
http://onlinelibrary.wiley.com/doi/10.1002/zoos.201300008/pdf
http://onlinelibrary.wiley.com/doi/10.1002/zoos.201300009/pdf
http://onlinelibrary.wiley.com/doi/10.1002/zoos.201300010/pdf
http://onlinelibrary.wiley.com/doi/10.1002/zoos.201300011/pdf
http://onlinelibrary.wiley.com/doi/10.1002/zoos.201300012/pdf
http://onlinelibrary.wiley.com/doi/10.1002/zoos.201300013/pdf
http://onlinelibrary.wiley.com/doi/10.1002/zoos.201300014/pdf
http://onlinelibrary.wiley.com/doi/10.1002/zoos.201300015/pdf
@blahah
blahah / express.rb
Created July 29, 2014 07:45
eXpress script
#!/usr/bin/env ruby
require 'csv'
require 'rubygems'
require 'trollop'
class Express
attr_accessor :bundle_id,:target_id,:length,:eff_length,:tot_counts,:uniq_counts
attr_accessor :est_counts,:eff_counts,:ambig_distr_alpha,:ambig_distr_beta,:fpkm
attr_accessor :fpkm_conf_low,:fpkm_conf_high,:solvable,:tpm
@blahah
blahah / trim-batch.rb
Created July 29, 2014 22:30
Trimmomatic wrapper script
#!/usr/bin/env ruby
#
# trim-batch
#
# Take in a set of fastq files and run trimmomatic on each
#
# You MUST specify the location of the trimmomatic JAR
#
# Richard Smith-Unna ([email protected])