Alyssa Frazee alyssafrazee

Using Ballgown and Polyester

ballgown creates a ballgown object from tablemaker output
ballgownrsem creates a ballgown object from RSEM output. (not yet well-tested).
gffRead and gffReadGR read GTF (annotation) files into R
- gffRead gives you a data frame
gffReadGR gives you a GRanges object

FOR LEARNING!

Make a repository on GitHub. Check the box that says "initialize this repo with a README."
Clone that repository on to your computer. That's git clone + the ssh URL you can find on the right-hand side of the repository. Go to the directory where you want the repository_name folder to live.
Run a git status as a sanity check. (Everything should be clean).
Write some code!
Run a git status again. See that your code now lives in your repository, but hasn't yet been added to version control. (the file names should be red in your terminal)
"Add" (git add) the code to version control.
Run another git status. The file names should now be green, meaning they've been added to the version control staging area, but not committed to your repository's history.
Commit your changes with git commit -m 'message_here').

instructions on website I didn't get my invite to get an SVN account - mystery.

add bioc-sync as a collaborator on your git repo
then, webhooks & services --> add webhook. (there's a URL for this on the bioc instruction page.)
git-svn bridge doesn't really do merging very well: it's "winner-take all." you have to pick whether git or svn wins on merge conflicts.
only deals with master branch of git repo

Hilary Mason's collection of research-quality datasets
100+ Interesting Data Sets: seems really great for ML/data science practice or fun side projects.
Most of the datasets available with R, but here ALSO available in CSV format! 700+ datasets.
Kaggle Higgs Boson data (I think this is a super cool problem)
Statistical Sleuth data problems -- pretty good for intro stats concepts
Hadley's data packages -- baby names by sex (1880-2013), fuel economy of cars, atmospheric measurements from Central America, and info on all NYC flights in 2013. Set up for R, but I bet you could process these with other software if you wanted.
others? (leave a comment!)

	import pandas as pd
	from numpy.random import randint
	from numpy import median, percentile

	my_data = pd.read_csv('dataset.csv')
	n = len(my_data)

	num_bootstrap_samples = 1000
	bootstrap_results = []
	for b in xrange(num_bootstrap_samples):

	## simple analysis code for GEUVADIS data
	## AF Oct 2014

	library(ballgown) #biocLite
	library(RSkittleBrewer) #install_github
	library(RColorBrewer) #CRAN
	library(usefulstuff) #install_github
	library(RCurl) #CRAN

	load('fpkm.rda') # download at http://files.figshare.com/1625419/fpkm.rda

	## create ballgown objects with GEUVADIS data

	source("http://bioconductor.org/biocLite.R")
	biocLite('ballgown')
	library(ballgown)
	system('mkdir -p Ballgown/small_objects')

	## make phenotype table:
	dataDir = 'Ballgown/' #tablemaker output lives here
	sampnames = list.files(dataDir, pattern = 'H\|N')

	# power calculation examples

	get_power = function(truep, p0, n, alpha=0.05) {
	num_rejections = 0
	for(i in 1:10000){
	dat = rbinom(n, size=1, prob=truep)
	pv = 2*(1-pbinom(sum(dat), size=n, prob=p0))
	if(pv < alpha) num_rejections = num_rejections + 1
	}
	return(num_rejections / 10000)