Skip to content

Instantly share code, notes, and snippets.

View blahah's full-sized avatar

Rik Smith-Unna blahah

  • upward spiral ∞⟨X∴↯⟩∞
  • Bristol / Berlin / Nairobi
  • X @blahah404
View GitHub Profile
@blahah
blahah / crossref_retractions.sh
Last active April 19, 2017 06:50
split crossref openretractions into separate files for API
cat ../crossref_processed.json | while read entry
do
doi=$(echo $entry | jq -r '.doi')
echo creating "$doi"
mkdir -p "$doi"
echo $entry > "$doi/data.json"
done
@blahah
blahah / pubmed_retractions.sh
Last active March 19, 2025 22:55
easily find retracted papers in PubMed, using only bionode-ncbi and jq
# you'll need:
# - bionode-ncbi (https://github.com/bionode/bionode-ncbi)
# - jq (https://github.com/stedolan/jq)
# count the number of retracted papers
bionode-ncbi search pubmed "\"Retracted Publication\"" \
| jq -c 'select(.pubtype[] | inside("Retracted Publication"))'
| wc -l
# get DOIs for all the retracted papers
@blahah
blahah / unnest_dirs.sh
Created April 13, 2017 14:32
unnest nested dirs
for path in $(find ../articles_split/articles/ -type d -wholename '*\/*\/*\/*\/*' | grep '.\{36\}'); do
mergeddir=$(echo $path | sed 's/\([0-9]\)\/\([0-9]\)\/\([0-9]\)\/\([0-9]\)\/\([0-9]\)/\1\2\3\4\5/')
mkdir -p $mergeddir
cp -R $path/* $mergeddir/
done
@blahah
blahah / problem.js
Last active April 11, 2017 14:33
stream never reads more than 20 results...
const resultbatcher = ds => {
let count = 0
const write = (list, cb) => {
count += list.length
bus.emit('results:receive', {
hits: list.map(r => {
r.source = ds.key
return r
@blahah
blahah / search.js
Created April 10, 2017 03:15
choo5 search field wip
const html = require('choo/html')
const css = require('csjs-inject')
const C = require('../lib/constants')
const style = css`
.search {
height: 30px;
width: 80%;
bottom: 0;
@blahah
blahah / nest-elife-json.sh
Last active April 9, 2017 15:18
move elife json into nested directories based on ID
for json in $(ls *.json); do
splitdir=$(echo $json | sed -e 's/elife-\([0-9]\{5\}\)-v[0-9]\.json/\1/' -e 's/\(.\)/\1\//g')
echo moving $json to $splitdir
mkdir -p $splitdir
mv $json $splitdir/
done
@blahah
blahah / crossref-sample-jats.sh
Last active April 8, 2017 16:23
get CC-BY 3 or 4 papers from CrossRef that have XML fulltext available (example URLs / bash pipelines)
# get count of fulltext XML papers by license
http://api.crossref.org/v1/works?filter=has-full-text:true,full-text.type:text/xml&facet=t
# for a given license, get count of publishers, e.g.
http://api.crossref.org/v1/works?filter=has-full-text:true,full-text.type:text/xml,license.url:http://creativecommons.org/licenses/by/3.0/&facet=t
# for a given license and publisher, get the first 10 papers URLs and download them, e.g.
URL = "http://api.crossref.org/v1/works?filter=has-full-text:true,full-text.type:text/xml,license.url:http://creativecommons.org/licenses/by/3.0/,publisher-name:Elsevier BV&rows=10"
curl $URL | jq ".message.items[].link[].URL" | grep 'text\/xml' | wget
@blahah
blahah / readdir_sync.js
Created April 6, 2017 20:22
hyperdrive v8 readdir behaviour check
const hyperdrive = require('hyperdrive')
const discover = require('hyperdiscovery')
const key = '33fcb3ea86942f913240d3f39c7c68f81fc2bdefc65cb56646d52a62a90bdec9'
const drive = hyperdrive('.', '')
drive.once('ready', () => {
driveswarm = discover(drive)
drive.once('content', () => {
for dir in $(ls); do
splitdir=$(echo $dir | sed -e 's/\(.\)/\1\//g' -e 's/\/$//')
echo $dir to $splitdir
mkdir -p $splitdir
mv $dir/* $splitdir
rmdir $dir
done
@blahah
blahah / run.js
Last active December 4, 2016 20:28
hyperdrive + pages usage
const level = require('level')
const hyperdrive = require('hyperdrive')
const discover = require('hyperdiscovery')
const pages = require('random-access-page-files')
const key = '154624e28aabcdf52625769f7b42361b4f7dafe53a14d27035d9ea9878262e16'
const drive = hyperdrive(level('./test_pages.hd'))
const archive = drive.createArchive(