Dave Tang davetang

Step 0:

Get Homebrew installed on your mac if you don't already have it

Step 1:

Install highlight. "brew install highlight". (This brings down Lua and Boost as well)

Step 2:

Git notes

Start

Install Git
Create a GitHub account
Open Terminal /Applications/Utilities/Terminal.app

Configure

Whether you're trying to give back to the open source community or collaborating on your own projects, knowing how to properly fork and generate pull requests is essential. Unfortunately, it's quite easy to make mistakes or not know what you should do when you're initially learning the process. I know that I certainly had considerable initial trouble with it, and I found a lot of the information on GitHub and around the internet to be rather piecemeal and incomplete - part of the process described here, another there, common hangups in a different place, and so on.

In an attempt to coallate this information for myself and others, this short tutorial is what I've found to be fairly standard procedure for creating a fork, doing your work, issuing a pull request, and merging that pull request back into the original project.

Creating a Fork

Just head over to the GitHub page and click the "Fork" button. It's just that simple. Once you've done that, you can use your favorite git client to clone your repo or j

Project Title

One Paragraph of project description goes here

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

Semantic Commit Messages

See how a minor change to your commit message style can make you a better programmer.

Format: <type>(<scope>): <subject>

<scope> is optional

	# Download the raw CADD TSV and Tabix index (no annotations, just scores)
	wget http://krishna.gs.washington.edu/download/CADD/v1.0/whole_genome_SNVs.tsv.gz
	wget http://krishna.gs.washington.edu/download/CADD/v1.0/whole_genome_SNVs.tsv.gz.tbi

	# it is big. 79Gb
	ls -ltrh whole_genome_SNVs.tsv.gz
	-rw-r--r-- 1 arq5x users 79G Sep 26 01:44 whole_genome_SNVs.tsv.gz

	# for testing, let's play with the chr22 intervals
	tabix whole_genome_SNVs.tsv.gz 22 \| bgzip > whole_genome_SNVs.tsv.22.gz

	# Load the raw training data and replace missing values with NA
	training.data.raw <- read.csv('train.csv',header=T,na.strings=c(""))

	# Output the number of missing values for each column
	sapply(training.data.raw,function(x) sum(is.na(x)))

	# Quick check for how many different values for each feature
	sapply(training.data.raw, function(x) length(unique(x)))

	# A visual way to check for missing data

	# Set a seed
	set.seed(500)

	library(MASS)
	data <- Boston

	# Check that no data is missing
	apply(data,2,function(x) sum(is.na(x)))

	# Train-test random splitting for linear model

	*bcftools filter
	*Filter variants per region (in this example, print out only variants mapped to chr1 and chr2)
	qbcftools filter -r1,2 ALL.chip.omni_broad_sanger_combined.20140818.snps.genotypes.hg38.vcf.gz

	*printing out info for only 2 samples:
	bcftools view -s NA20818,NA20819 filename.vcf.gz

	*printing stats only for variants passing the filter:
	bcftools view -f PASS filename.vcf.gz