Andy Teucher ateucher

In the shell:

git clone https://github.com/hadley/adv-r.git
gem install jekyll mime-types

In R:

Some ggplot tutorials available online:

H. Wickham: Official ggplot2 documentation
H. Wickham: ggplot2 book
W. Chang: R graphics cookbook and Cookbook for R
Z. Ross: Beautiful plotting in R: A ggplot2 cheatsheet
D. Koffman: Introduction to ggplot2
R. Saccilotto: Tutorial: ggplot2
R. Hartman: How to format plots for publication using ggplot2
G. Williams: Visualising data with ggplot2

I think the two most important messages that people can get from a short course are:

a) the material is important and worthwhile to learn (even if it's challenging), and b) it's possible to learn it!

For those reasons, I usually start by diving as quickly as possible into visualisation. I think it's a bad idea to start by explicitly teaching programming concepts (like data structures), because the pay off isn't obvious. If you start with visualisation, the pay off is really obvious and people are more motivated to push past any initial teething problems. In stat405, I used to start with some very basic templates that got people up and running with scatterplots and histograms - they wouldn't necessary understand the code, but they'd know which bits could be varied for different effects.

Apart from visualisation, I think the two most important topics to cover are tidy data (i.e. http://www.jstatsoft.org/v59/i10/ + tidyr) and data manipulation (dplyr). These are both important for when people go off and apply

Git and GitHub (Hadley Wickham): http://r-pkgs.had.co.nz/git.html
R development using GitHub (Gabor Csardi): https://github.com/MangoTheCat/github-workshop
Working with RStudio, Git, GitHub (STAT 545): http://stat545-ubc.github.io/git00_index.html
Version control with git (R. Fitzjohn): http://nicercode.github.io/2014-02-13-UNSW/lessons/70-version-control/
Version control with Git (Software Carpentry): http://software-carpentry.org/v5/novice/git/index.html

Finessing Excel's stupid line endings

I am sheepish to admit a certain type of routine Microsoft Excel use.

Current example: I am marking for STAT 545. I use R to create a comma delimited marking sheet, by joining the official class list and peer reviews. The sheet contains variables, initially set to NA, where the TAs and I enter official marks and optional comments.

This is where Excel comes in. I like its visual organization of this comma delimited file much more than, say, using a plain text editor. I use the ability to hide columns, resize columns, wrap text, and (gasp!) even fill rows with grey to indicate I am done.

I keep saving the file as comma delimited and I put up with Excel's incessant freak out about "losing features". This is not a one time thing. I need to save and commit this file many times before it is considered done.

I'm going to start off by describing a pretty common data analysis scenario, and then talk about how using R can help:

You have a lot of individual spreadsheet files containing your data, and you need it all together, so you copy and paste each one into a master file.
Next you do a bunch of data cleaning in the master spreadsheet - fixing date formats, unit conversions, transformations, etc.
You then import the data into your favourite statistics program, run your analysis, and
copy the outputs back into a spreadsheet or other graphing program to plot your results.
You give the results to a colleague to review and she comes back with some concerns that something doesn't look quite right with the results. She also suggests that a different modelling technique would be more appropriate.
You comb through the original data and realize that in some of the files one column was misaligned, and so in copying and pasting these into the master dataset this error was compounded over many rows.
In ad

Getting Yosemite and R to play nice

Here are some tips on getting R development working with Yosemite. Contribute what you know below and I'll add it in.

`homebrew`

I went ahead and re-installed all of my homebrew. You can find out what you have installed with

	library(ggplot2)
	library(shiny)

	# Call ggbrush with a ggplot2 object, and the dimensions which
	# should be brushed (try "xy" for scatter, "x" for histogram).
	# The plot will show in RStudio Viewer or your web browser, and
	# any observations selected by the user will be returned.
	ggbrush <- function(plotExpr, direction = c("xy", "x", "y")) {

	# See below for definition of dialogPage function

	#!/bin/bash
	# rename TMS tiles to the XYZ schema
	# no quoting, since all files have simple numeric names
	# do not run this anywhere else than INSIDE your tiles directory

	# run it like this: find . -name "*.png" -exec ./tms2xyz.sh {} \;

	filename=$1

	tmp=${filename#*/} # remove to first /

	rgb2hex <- function(r,g,b) sprintf('#%s',paste(as.hexmode(c(r,g,b)),collapse = ''))

	rgb2hex(255,0,0)
	# returns '#ff0000'