Skip to content

Instantly share code, notes, and snippets.

View briatte's full-sized avatar

François Briatte briatte

View GitHub Profile
@briatte
briatte / porngram.r
Last active August 29, 2015 13:55
ggplot2 wrapper for http://porngram.sexualitics.org/ (uses elements from ngramr)
porngram <- function(x = c("hardcore", "softcore"), ..., adjust = "xxx") {
library(ggplot2)
library(XML)
library(reshape)
library(rPython)
x = c(x, ...)
if (length(x) > 10) {
x <- x[1:10]
warning("Porngram API limit: only using first 10 phrases.")
@briatte
briatte / README.md
Last active August 29, 2015 13:56
aggregation functions, test #1: base, plyr, dplyr

Collapsing a 4-column data frame of real data from 500,000 rows to 91,000 by pasting and counting row values. Execution on a 1.8GHz Intel Core i5 shows that dplyr is 1.5 times quicker than base R.

See this Gist for a simpler test over twice more rows and roughly as many groups. In both tests, dplyr is as concise as plyr, as fast as data.table, and clearly more readable than base R.

@briatte
briatte / README.md
Last active August 29, 2015 13:57
aggregation functions, test #2: base, dplyr, data.table

Here's a simple timing test of aggregation functions in R, using 1.3 million rows and 80,000 groups of real data on a 1.8GHz Intel Core i5. Thanks to Arun Srinivasan for helpful comments.

The fastest function to run through the data.frame benchmark is data.table, which runs twice faster than dplyr, which runs ten times faster than base R.

For a benchmark that includes plyr, see this earlier Gist for a computationally more intensive test on half a million rows, where dplyr still runs 1.5 times faster than aggregate in base R.

Both tests confirm what W. Andrew Barr blogged on dplyr:

the 2 most important improvements in dplyr are >

@briatte
briatte / .bash_profile
Last active August 17, 2017 00:07
.bash_profile
# ==============================================================================
# TWEAKS
# ==============================================================================
# use nano instead of vi as default editor
# (e.g. for crontab -e)
#
export EDITOR="nano"
# tell ls to be colourful
@briatte
briatte / full.r
Last active October 7, 2016 16:11
100-lines scraper for plenary statements by Members of the European Parliament — see briatte/euspeech for the full project
library(XML)
library(jsonlite)
library(plyr)
dir.create("records")
data = "meps.csv"
if(!file.exists(data)) {
html = "http://www.europarl.europa.eu/meps/en/directory.html?filter=all&leg="

Improving access to panel series data for social scientists: the psData package

GitHub repository: https://github.com/rOpenGov/psData

Social scientists have access to many electronically available panel series datasets. However, downloading, cleaning, and merging them together is time-consuming and error-prone: for example, using Reinhart and Rogoff's data on the fiscal costs of the financial crisis involves downloading, cleaning, and merging 4 Excel files with over 70 individual sheets, one for each country’s data. Furthermore, because such datasets are not bundled in a format that is easy to manipulate, many of them are not updated on a regular basis.

In this talk, we introduce the psData package for the R statistical software. This package is being developed under the rOpenGov framework to solve two problems:

  1. Time wasted by social scientists downloading, cleaning, and transforming commo
@briatte
briatte / pubmed_ask.r
Last active October 19, 2020 13:14
pubmed scraper
#' Get a PubMed search index
#' @param query a PubMed search string
#' @return the XML declaration of the search
#' @example
#' # Which articles discuss the WHO FCTC?
#' pubmed_ask("FCTC OR 'Framework Convention on Tobacco Control'")
pubmed_ask <- function(query) {
# change spaces to + and single-quotes to URL-friendly %22 in query
query = gsub("'", "%22", gsub(" ", "+", query))
@briatte
briatte / declarations.r
Created July 24, 2014 12:29
download all asset declarations from French MPs, July 2014
# parse XPath syntax from well-formed HTML
library(XML)
# complete archive will take ~ 1.4 GB on disk
dir.create("declarations", showWarnings = FALSE)
# finds 941 MPs on 2014-07-24 at website launch
h = htmlParse("http://www.hatvp.fr/consulter-les-declarations-rechercher.html")
h = paste0("http://www.hatvp.fr/", xpathSApply(h, "//div[@id='annuaire']/*/*/*/a/@href"))
@briatte
briatte / icelandic-legal-code-network.md
Last active August 29, 2015 18:37
network projection of cross-references in the Icelandic legal code
@briatte
briatte / fix.r
Created October 2, 2014 04:49
fix R locale
system("defaults write org.R-project.R force.LANG en_US.UTF-8")