Skip to content

Instantly share code, notes, and snippets.

# Approximate way of creating a GC model from a different gcmodel
# This takes a while to run ...
library("data.table")
# We take Human 1M-Duev3 since it has the most SNPs
gcmodel.other <- data.table(read.table("run_ind_ex/lib/GC_hg19/Human1M-Duov3-0_hg19.gcmodel",h=T,sep="\t"))
setkey(gcmodel.other, "Chr", "Position")
# Read the map file
#!/usr/bin/perl
#Joseph Glessner
=head
Inputs:
PennCNV .log -> LRR_SD, GCWF
PennCNV .rawcnv -> CountCNV
Plink --missing .imiss / Illumina LIMs Project Detail Report .csv / Illumina Genome Studio Samples Table / General format SampleID CallRate -> CallRate
Eigenstrat smartPCA .pca.evec / Plink .mds -> PCA
Plink --genome .genome -> PI_HAT relatedness
make sure to include extremevalues_2.2.tar.gz http://cran.r-project.org/web/packages/extremevalues/index.html
@cth
cth / README
Last active October 26, 2015 10:00
For imputed genotypes used the batch of 2015-04-27 which include DCH an other smaller cohorts.
This data is imputed using impute2 version 2.3.0 and phased using shapeit v2.r727.
Variants with p(HWE) < 0.000001 or callrate less than 95%, MAF less than %1 were removed prior
to impution. The panel used was 1000 Genomes phase 1.
The dosages all 48 SNPs were extracted from the VCF created from the imputed data
and merged with the 13 directly genotyped SNPs. Whenever, directly genotyped variants
are available they are used in place of the imputed dosages.
@cth
cth / ABCA6.sh
Last active November 10, 2015 08:29
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
# Check if CNVs are LD with SNPs.
for file in Samples*txt
do
gene=`echo $file|sed 's/.txt//'|sed 's/Samples//'`
# Extract chr and (one) position from file
@cth
cth / vcf2bed.pl
Last active November 11, 2015 12:21 — forked from mlawson/vcf2bed.pl
Converts a vcf file (typically of indels) into a bed file
#!/usr/bin/perl
use strict;
if (scalar @ARGV != 1) {
print "vcf2bed.pl <vcf_file>\n";
exit 1;
}
my $vcfFile = $ARGV[0];
require 'bio-samtools'
#!/home/fng514/bin/ruby
#$ -S /home/fng514/bin/ruby
#$ -cwd
require 'bio-samtools'
require 'zlib'
rsnumbers = {}
puts ARGV.inspect
throw "invalid numbmer of args" unless ARGV.size == 2
@cth
cth / power.R
Created November 26, 2015 23:42
## Very simple power calculation
## assuming a true odds ratio of 3.1
pick_eater_obese <- function(n) table(runif(n) < 109/(330+109))
pick_eater_control <- function(n) table(runif(n) < 37/(37+38))
#controls <- data.frame(pheno = rep(1000,"case"), runif(
p.values <- replicate(1000, fisher.test( matrix(c(pick_eater_obese(50),pick_eater_control(50)),2) )$p )
significance <- c(5e-1, 5e-2, 5e-3, 5e-4, 5e-5)
@cth
cth / power.R
Created November 26, 2015 23:42
## Very simple power calculation
## assuming a true odds ratio of 3.1
pick_eater_obese <- function(n) table(runif(n) < 109/(330+109))
pick_eater_control <- function(n) table(runif(n) < 37/(37+38))
p.values <- replicate(1000, fisher.test( matrix(c(pick_eater_obese(50),pick_eater_control(50)),2) )$p )
significance <- c(5e-1, 5e-2, 5e-3, 5e-4, 5e-5)
powers <- data.frame(significance, power = sapply(significance,function(x) mean(p.values < x)))
@cth
cth / README
Last active May 12, 2016 10:36
strandflip.R version 0.2
By Christian Theil Have
with inspiration from SAS script by Aldi Kraja
DISCLAIMER: NO WARRANTIES WHATSOEVER ARE PROVIDED WITH THIS
PROGRAM. YOU RUN IT AT YOUR OWN RISK.
YOU CAN DISTRIBUTE this utility pgm in other collaborations
purpose: flip dosage if the alleles in original data are
different when compared to 1000G set of nucmtDNA