Skip to content

Instantly share code, notes, and snippets.

@njahn82
njahn82 / license.md
Last active August 29, 2015 14:17
Licenses in OpenAPC dataset

Licenses found

download_apc <- function(path = NULL, dir = "tmp", file = "apc_de.csv"){
  if(is.null(path)) {
    path <- c("https://raw.githubusercontent.com/OpenAPC/openapc-de/master/data/apc_de.csv")
  } 
  dir.create(dir) 
  download.file(url = path, destfile = paste(dir, file, sep = "/"), method = "curl")
---
title: "Mean APC per Institution"
author: "Najko Jahn"
date: "28. April 2015"
output: html_document
---
```{r, echo =FALSE}
knitr::opts_knit$set(base.url = "/")
@njahn82
njahn82 / fetch_openaire.r
Last active August 29, 2015 14:24
fetch_openaire.r
require(ropenaire)
require(dplyr)
# fetch ugoe projects
ugoe <- roa_projects(org = "UGOE")
# fetch publications
tt <- plyr::ldply(as.character(ugoe$grantID), roa_pubs)
# tidy up: split collapsed ec projects
tmp <-
tidyr::separate(
```{r, echo = FALSE}
knitr::opts_knit$set(base.url = "/")
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
warning = FALSE,
message = FALSE,
echo = TRUE,
fig.width = 9,
fig.height = 6,
@njahn82
njahn82 / isi_doaj.R
Created October 16, 2015 11:00
Match records from ISI Web of Science with the DOAJ
# load ISI spradsheet
vu_amst <- read.csv("pubs_2012-14 V5 CSV.csv", header = TRUE, sep =";",
na.strings = "")
vu_amst$SN <- as.character(vu_amst$SN)
# load DOAJ spreadsheet
doaj <- httr::content(httr::GET("http://doaj.org/csv"))
# join ISSN and EISSN as vector
doaj.issn <- c(as.character(doaj$Journal.ISSN..print.version.),
@njahn82
njahn82 / report_vu_doaj.Rmd
Last active October 16, 2015 13:56
ISI - DOAJ match for VU Amsterdam
## Load Data
```{r}
require(dplyr)
# load ISI spradsheet and select only columns needed
vu_amst <- read.csv("pubs_2012-14 V5 CSV.csv", header = TRUE, sep =";",
na.strings = "") %>%
select(JI, PY, SN, PU, DI, UT)
tbl_df(vu_amst)

Using Catmandu within R

system("catmandu convert YAML < example.yaml", intern = TRUE) %>%
  jsonlite::fromJSON()
@njahn82
njahn82 / feed_template.xml
Last active April 14, 2016 18:50
Libreas Feed
---
layout: null
---
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">
@njahn82
njahn82 / sci-hub.md
Last active August 29, 2016 13:26
3:AM Hack Day porposal

Proposed datasets

Sci-Hub usage

In spring 2016, [Science featured a dataset on global usage of Sci-Hub](http://www.sciencemag.org/news/2016/04/whos-downloading-pirated-papers- everyone), a prominent shadow library for scholarly literature. The dataset, openly available via Dryad, tracks more than 28 Mio Sci-Hub usage events over a period of six month on the article-level. Tab-separated files contain timestamps, geo-locations (latitude, longitude), and the Digital Object Identifiers (DOI) of each requested full text.

@njahn82
njahn82 / wos_tidy_text.md
Last active April 16, 2017 19:55
Tidy Text Mining of Web of Science Abstracts
  ## ngram following http://tidytextmining.com/ngrams.html
    library(dplyr)
    library(tidytext)
    library(tidyr)

    # load manually downloaded web of science data dump
    tt <- jsonlite::stream_in(file("data/wos_total.json"), verbose = FALSE) %>% 
      filter(!is.na(AB))