Skip to content

Instantly share code, notes, and snippets.

View dpastoor's full-sized avatar

Devin Pastoor dpastoor

  • A2-Ai
  • Rockville, MD
View GitHub Profile
This post examines the features of [R Markdown](http://www.rstudio.org/docs/authoring/using_markdown)
using [knitr](http://yihui.name/knitr/) in Rstudio 0.96.
This combination of tools provides an exciting improvement in usability for
[reproducible analysis](http://stats.stackexchange.com/a/15006/183).
Specifically, this post
(1) discusses getting started with R Markdown and `knitr` in Rstudio 0.96;
(2) provides a basic example of producing console output and plots using R Markdown;
(3) highlights several code chunk options such as caching and controlling how input and output is displayed;
(4) demonstrates use of standard Markdown notation as well as the extended features of formulas and tables; and
(5) discusses the implications of R Markdown.

Notes:

  • I've tried to break up in to separate pieces, but it's not always possible: e.g. knowledge of data structures and subsetting are tidy intertwined.

  • Level of Bloom's taxonomy listed in square brackets, e.g. http://bit.ly/15gqPEx. Few categories currently assess components higher in the taxonomy.

Programming R curriculum

Data structures

@dpastoor
dpastoor / 0_reuse_code.js
Created November 7, 2013 02:24
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
@dpastoor
dpastoor / server.R
Created January 11, 2014 01:16 — forked from jknowles/server.R
library(shiny)
shinyServer(function(input,output){
output$distPlot<-reactivePlot(function(){
dist<-rnorm(input$obs)
p<-qplot(dist,binwidth=0.1)+geom_vline(xintercept=mean(dist))+theme_dpi()
p<-p+coord_cartesian(xlim=c(-4,4))+geom_vline(xintercept=median(dist),color=I("red"))
print(p)
})
@dpastoor
dpastoor / package.R
Created February 21, 2014 03:04 — forked from jbryer/package.R
#' Simplified loading and installing of packages
#'
#' This is a wrapper to \code{\link{require}} and \code{\link{install.packages}}.
#' Specifically, this will first try to load the package(s) and if not found
#' it will install then load the packages. Additionally, if the
#' \code{update=TRUE} parameter is specified it will check the currently
#' installed package version with what is available on CRAN (or mirror) and
#' install the newer version.
#'
#' @param pkgs a character vector with the names of the packages to load.
@dpastoor
dpastoor / ggkm.R
Created April 2, 2014 23:42 — forked from araastat/ggkm.R
#’ Create a Kaplan-Meier plot using ggplot2
#
#’ @param sfit a \code{\link[survival]{survfit}} object
#’ @param returns logical: if \code{TRUE}, return an ggplot object
#’ @param xlabs x-axis label
#’ @param ylabs y-axis label
#’ @param ystratalabs The strata labels. \code{Default = levels(summary(sfit)$strata)}
#’ @param ystrataname The legend name. Default = “Strata”
#’ @param timeby numeric: control the granularity along the time-axis
#’ @param main plot title
@dpastoor
dpastoor / server.r
Created April 23, 2014 14:28 — forked from sckott/server.r
library("shiny")
library("plotly")
library("ggplot2")
shinyServer(function(input, output) {
output$text <- renderText({
ggiris <- qplot(Petal.Width, Sepal.Length, data=iris, color=Species)
py <- plotly("RgraphingAPI", "ektgzomjbx")
res <- py$ggplotly(ggiris)
iframe <- paste("<iframe height=\"600\" id=\"igraph\" scrolling=\"no\" seamless=\"seamless\" src=\"",
import os
import random
import string
import tempfile
import subprocess
def random_id(length=8):
return ''.join(random.sample(string.ascii_letters + string.digits, length))
TEMPLATE_SERIAL = """
@dpastoor
dpastoor / group_effect.md
Last active August 29, 2015 14:10 — forked from arunsrinivasan/group_effect.md
an update on keyed performance for dt and dplyr benchmarks

Update: The timings are now updated with runs from R v3.1.0.

A small note on this tweet from @KevinUshey and this tweet from @ChengHLee:

The number of rows, while is important, is only one of the factors that influence the time taken to perform the join. From my benchmarking experience, the two features that I found to influence join speed, especially on hash table based approaches (ex: dplyr), much more are:

  • The number of unique groups.
  • The number of columns to perform the join based on - note that this is also related to the previous point as in most cases, more the columns, more the number of unique groups.

That is, these features influence join speed in spite of having the same number of rows.

@dpastoor
dpastoor / README.md
Last active August 29, 2015 14:13 — forked from dariocravero/README.md

Create a Meteor app and put the client_/server_ files in a client/server directories. Also, create a public dir to save the uploaded files.