Skip to content

Instantly share code, notes, and snippets.

@indraniel
indraniel / 4-identify-and-cleanup-corrupt-cadd-jobs.py
Last active June 8, 2017 21:37
scripts to undo/expunge and redo postvqsr38 pipeline steps (BIO-2310)
#!/usr/bin/env python
from __future__ import print_function, division
import os, sys, re, subprocess, time, datetime, shutil
def log(msg):
timestamp = datetime.datetime.now().strftime("%Y-%m-%d %T")
print('[-- {} --] {}'.format(timestamp, msg), file=sys.stderr)
def touch(fname, times=None):
@indraniel
indraniel / download-1000-genomes-vcf.py
Created May 26, 2017 23:08
Python example to download files from an anonymous FTP server (example case from 1000 Genomes)
#!/usr/bin/env python
from __future__ import print_function
import ftplib, datetime, sys
# ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr14.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
def log(msg):
timestamp = datetime.datetime.now().strftime("%Y-%m-%d %T")
print('[-- {} --] {}'.format(timestamp, msg), file=sys.stderr)
@indraniel
indraniel / b37-annotate-w-gnomAD.sh
Last active June 30, 2023 03:45
A bash script to annotate a large VCF (i.e. containing many samples) with gnomAD (build 37 edition)
#!/bin/bash
# Usage:
# bash b37-annotate-w-gnomAD.sh </path/to/input.vcf.gz> </path/to/output.vcf.gz>
#
# Note:
# 1. This script with create a "scratch" directory to hold intermediate file
# states created during the annotation process. The scratch directory will
# be created next to the /path/to/outvcf.vcf.gz .
#
#!/bin/bash
BASE=/gscmnt/gc2802/halllab/idas/laboratory/yaps2-cadd-vep-test
VIRTUALENV=${BASE}/test-venv
source ${VIRTUALENV}/bin/activate
# see confluence
export BMETRICA_DSN="mysql://USER:PASSWORD@hostname:port/database"
#!/bin/bash
AWK=/usr/bin/awk
CADDpath="`dirname \"$0\"`"
CADDpath="`( cd \"$CADDpath/..\" && pwd )`"
if [ -z "$VEPpath" ] ; then source ${CADDpath}/bin/config.sh; fi
${CADDpath}/bin/src/annotateVEP.py \
--all \
--vep=<( zcat ${1} \
| ${CADDpath}/bin/src/VCF2vepVCF.py \
| sort -k1,1 -k2,2n -k3,3 -k4,4 \
@indraniel
indraniel / sample-vcfs.sh
Created April 14, 2017 18:25
sample a large vcf for testing purposes
#!/bin/bash
BASE=/gscmnt/gc2802/halllab/idas/laboratory/yaps2-cadd-vep-test
BCFTOOLS=/gscmnt/gc2719/halllab/bin/bcftools
BGZIP=/gscmnt/gc2802/halllab/idas/software/vep/local/htslib-1.3.2/bin/bgzip
TABIX=/gscmnt/gc2802/halllab/idas/software/vep/local/htslib-1.3.2/bin/tabix
mkdir -p ${BASE}/data/derived/1-create-test-postvqsr-input-data-file/test-vcfs
# cat /gscmnt/gc2802/halllab/idas/jira/BIO-2197/data/derived/7-create-postvqsr-input-data-file/dataset.tsv \
#!/bin/bash
OPAM_VERSION="1.2.2"
OCAML_VERSION="4.04.0"
BASE=/home/archive/ocaml-${OCAML_VERSION}
ARCH=$(uname -m)
OS=$(uname -s)
INSTALL_ROOT=${BASE}/root # generally this value is "$HOME/.opam"
@indraniel
indraniel / application.conf
Last active August 19, 2024 12:48
Example Cromwell / WDL usage at the MGI
webservice {
port = 8000
interface = 0.0.0.0
instance.name = "reference"
}
akka {
loggers = ["akka.event.slf4j.Slf4jLogger"]
actor {
default-dispatcher {
@indraniel
indraniel / test.cpp
Created December 12, 2016 02:46
Example Simple C++ logging (w/o C++11 or later)
#include <iostream>
#include <ctime>
void logMsg( const std::string& msg ) {
time_t rawtime;
struct tm * timeinfo;
char buffer[80];
time (&rawtime);
timeinfo = localtime(&rawtime);
@indraniel
indraniel / simple-tsv-parsing.clj
Created November 28, 2016 23:49
simple tsv parsing experiments in clojure
user=> (->> (line-seq (io/reader "test-data.tsv"))
#_=> (map #(string/split % #"\t"))
#_=> (map #(nth % 0))
#_=> (map #(Integer/parseInt %))
#_=> (+))
ClassCastException Cannot cast clojure.lang.LazySeq to java.lang.Number java.lang.Class.cast (Class.java:3369)
user=> (->> (line-seq (io/reader "test-data.tsv"))
#_=> (map #(string/split % #"\t"))
#_=> (map #(nth % 0))