Skip to content

Instantly share code, notes, and snippets.

View pontikos's full-sized avatar
😀

Nikolas Pontikos pontikos

😀
View GitHub Profile
@pontikos
pontikos / multiallele_to_single_gvcf.py
Created January 15, 2015 23:02
splits ALT alleles over multiple lines
import sys,gzip
vcf = gzip.open(sys.argv[1],"r")
#these 9 column headers are standard to all VCF files
STD_HEADERS=['CHROM','POS','ID','REF','ALT','QUAL','FILTER','INFO','FORMAT']
#the samples headers depend on the number of samples in the file
#which we find out once we read the #CHROM line
SAMPLE_HEADERS=[]
@pontikos
pontikos / joinonfield.py
Created January 16, 2015 17:19
Join lines of CSV file on column field number specified (starting at 0).
import sys
i=int(sys.argv[1])
d=dict()
for l in sys.stdin.readlines():
l=l.strip().split(',')
k=l.pop(i)
d[k]=d.get(k,[])+l
for k in d:
print ','.join([k]+d[k])
@pontikos
pontikos / createlinks.sh
Created January 19, 2015 13:00
Script for creating links to files (dirs are ignored).
@pontikos
pontikos / xlsx2csv.sh
Created February 12, 2015 12:59
Convert excel spreadsheet to csv using https://github.com/dilshod/xlsx2csv
#!/bin/bash
#$ -S /bin/bash
#$ -o /dev/null
#$ -e /dev/null
#$ -cwd
#$ -V
#$ -R y
#$ -pe smp 1
#$ -l scr=1G
#$ -l tmem=2G,h_vmem=1G
@pontikos
pontikos / vcf-samples.sh
Created February 14, 2015 12:36
Get sample names from vcf.
#! /bin/env bash
function error() { >&2 echo -e "\033[31m$*\033[0m"; }
function stop() { error "$*"; exit 1; }
try() { "$@" || stop "cannot $*"; }
file=$1
#doesn't work for double extension .gvcf.gz
ext="${file##*.}"
search=
@pontikos
pontikos / pop-pca.R
Last active August 29, 2015 14:15
Populations pca of onekg and aj samples using snpstats. Samples have been LD trimmed using plink.
ibrary(snpStats)
d <- read.plink('all.bed','all.bim','all.fam')
print(dim(X <- d$genotypes))
# snps were everyone is the same thing are boring
#snp.qc <- col.summary(X)
#X <- X[,snp.qc$MAF > 0]
# also should remove singleton variants i.e only present in a single person
@pontikos
pontikos / bars_and_stars.py
Created March 10, 2015 13:43
Bars and stars algorithm with three bins. Goal is to extend to N bins. If anyone has any ideas?
# bars and stars algorithm
N=5
for n in range(0,N):
x=[1]*n
for i in range(0,(len(x)+1)):
for j in range(i,(len(x)+1)):
print 100-n, sum(x[0:i]), sum(x[i:j]), sum(x[j:len(x)])
@pontikos
pontikos / michigan_impute_server_download.md
Last active May 30, 2021 16:49
Retrieve download URLs from Michigan impute server

On the results page of the imputation, in Chrome, open you javascript console and run this:

copy(document.body.innerHTML);

This will copy the javacript rendered page to you clipboard. Now paste it in a document say download_page.html.

Then run this python script to extract the urls:

from __future__ import print_function
@pontikos
pontikos / check_chrom_size.sh
Last active June 8, 2016 18:24
Check that your VCFs are not truncated.
# chrom sizes in hg19
declare -A sizes
sizes["chr1"]=249250621
sizes["chr2"]=243199373
sizes["chr3"]=198022430
sizes["chr4"]=191154276
sizes["chr5"]=180915260
sizes["chr6"]=171115067
sizes["chr7"]=159138663
@pontikos
pontikos / tabix-kaviar.R
Created June 9, 2016 18:02
Add kaviar annotation to annotation csv file.
#!/usr/bin/env Rscript
library(Rsamtools)
# '/cluster/scratch3/vyp-scratch2/reference_datasets/Kaviar/Kaviar-160204-Public/vcfs/Kaviar-160204-Public-hg38.vcf.gz'
#f <- '/cluster/scratch3/vyp-scratch2/reference_datasets/Kaviar/Kaviar-160204-Public/vcfs/Kaviar-160204-Public-hg19.vcf.gz'
read('rare_shared_2006_2006A.csv')->d
x <- do.call('rbind', strsplit(d$VARIANT_ID, '_'))