Get Homebrew installed on your mac if you don't already have it
Install highlight. "brew install highlight". (This brings down Lua and Boost as well)
Get Homebrew installed on your mac if you don't already have it
Install highlight. "brew install highlight". (This brings down Lua and Boost as well)
/Applications/Utilities/Terminal.app
# Download the raw CADD TSV and Tabix index (no annotations, just scores) | |
wget http://krishna.gs.washington.edu/download/CADD/v1.0/whole_genome_SNVs.tsv.gz | |
wget http://krishna.gs.washington.edu/download/CADD/v1.0/whole_genome_SNVs.tsv.gz.tbi | |
# it is big. 79Gb | |
ls -ltrh whole_genome_SNVs.tsv.gz | |
-rw-r--r-- 1 arq5x users 79G Sep 26 01:44 whole_genome_SNVs.tsv.gz | |
# for testing, let's play with the chr22 intervals | |
tabix whole_genome_SNVs.tsv.gz 22 | bgzip > whole_genome_SNVs.tsv.22.gz |
Whether you're trying to give back to the open source community or collaborating on your own projects, knowing how to properly fork and generate pull requests is essential. Unfortunately, it's quite easy to make mistakes or not know what you should do when you're initially learning the process. I know that I certainly had considerable initial trouble with it, and I found a lot of the information on GitHub and around the internet to be rather piecemeal and incomplete - part of the process described here, another there, common hangups in a different place, and so on.
In an attempt to coallate this information for myself and others, this short tutorial is what I've found to be fairly standard procedure for creating a fork, doing your work, issuing a pull request, and merging that pull request back into the original project.
Just head over to the GitHub page and click the "Fork" button. It's just that simple. Once you've done that, you can use your favorite git client to clone your repo or j
# Load the raw training data and replace missing values with NA | |
training.data.raw <- read.csv('train.csv',header=T,na.strings=c("")) | |
# Output the number of missing values for each column | |
sapply(training.data.raw,function(x) sum(is.na(x))) | |
# Quick check for how many different values for each feature | |
sapply(training.data.raw, function(x) length(unique(x))) | |
# A visual way to check for missing data |
# Set a seed | |
set.seed(500) | |
library(MASS) | |
data <- Boston | |
# Check that no data is missing | |
apply(data,2,function(x) sum(is.na(x))) | |
# Train-test random splitting for linear model |
*bcftools filter | |
*Filter variants per region (in this example, print out only variants mapped to chr1 and chr2) | |
qbcftools filter -r1,2 ALL.chip.omni_broad_sanger_combined.20140818.snps.genotypes.hg38.vcf.gz | |
*printing out info for only 2 samples: | |
bcftools view -s NA20818,NA20819 filename.vcf.gz | |
*printing stats only for variants passing the filter: | |
bcftools view -f PASS filename.vcf.gz |