Skip to content

Instantly share code, notes, and snippets.

View johandahlberg's full-sized avatar

Johan Dahlberg johandahlberg

View GitHub Profile
@johandahlberg
johandahlberg / ReplaceSampleNameInReadGroup.java
Created September 11, 2012 12:46
A Picard class for switching read names according to a translation table in a white space separated file.
package net.sf.picard.sam;
import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
@johandahlberg
johandahlberg / gist:4058394
Created November 12, 2012 09:42
Working document for Scala/GATK/Queue setup guide.
*----------------------------------
* Setting up git
*----------------------------------
Clone gatk git repo
Add the main gatk repo as a upstream remote repo:
git remote add upstream https://github.com/broadgsa/gatk.git
Change the remote of the master branch to upstream. .git/config should look like this.
#!/usr/bin/python
import os
def get_immediate_subdirectories(dir):
return [name for name in os.listdir(dir)
if os.path.isdir(os.path.join(dir, name))]
# Get runfolder name from file and return as list.
def get_runfolders_already_run(file):
@johandahlberg
johandahlberg / SimpleBWT
Last active December 12, 2015 04:49
Naive functional implementation of Burrows-Wheels transform in scala
import scala.math.Ordering
object SimpleBW extends App {
def bwt(s: String): String = {
def tranformString(string: String): List[String] = { transformString(string, string.length) }
def transformString(string: String, splitIndex: Int): List[String] = {
if (splitIndex == 0) {
// Recursion base case
@johandahlberg
johandahlberg / PlayWithURLs.scala
Created April 12, 2013 08:28
Playing with getting source from URL, to get gene names from refseq-ids and converting them into R if-statements. Mostly an experiment in using Scalas parallel collections. NOTE: This should under no circumstances be used in production, then you need to be using some API instead. You have been warned.
object PlayWithURLs extends App {
// Some ids for the BRCA1 gene
val refseqids = List("uc010whl.1", "uc002icp.3", "uc010whp.1")
def queryRefSeqId(refSeqId: String): String = {
def read(url: String): String = io.Source.fromURL(url).mkString
val document = read("http://genome.ucsc.edu/cgi-bin/hgGene?hgg_gene=" + refSeqId + "&org=human")
val regexp = """.*Genetic Association Database:\s\<A HREF=.*\>(\w+)\<.*""".r
@johandahlberg
johandahlberg / gist:5459321
Last active December 16, 2015 15:58
Ugly code example for slides on Best practices in scientific computing.
object UglyCodeExample extends App {
def m(stringToTransform: String): String = {
def t(string: String): List[String] = {
def th(string: String, si: Int): List[String] = {
if (si == 0) Nil
else {
val (firstString, secondString) = string.splitAt(si); val newString = secondString + firstString
newString :: th(string, si - 1)}}
th(string, string.length)
}
@johandahlberg
johandahlberg / BetterCodeExample.scala
Last active December 16, 2015 15:59
A better code example for slides on best practice in scientific computing.
object BetterCodeExample extends App {
/**
* Burrows-Wheeler transform of string
*
* Perform the Burrows-Wheeler transform on a string. See http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform
* for a introduction to the algorithm.
*
* This particular implementation is recursive.
*
@johandahlberg
johandahlberg / BetterCodeExampleUnitTests.scala
Created April 25, 2013 13:33
Unit test example for the BetterCodeExample burrows wheeler functions
import org.scalatest.FunSuite
import BetterCodeExample._
class BetterCodeExampleUnitTests extends FunSuite {
test("Burrow wheelers transform of ^BANANA|") {
assert(BetterCodeExample.burrowsWheelersTransform("^BANANA|") === """BNN^AA|A""")
}
@johandahlberg
johandahlberg / GCCounter.scala
Last active December 17, 2015 12:39
GCCounter in Scala inspired by http://saml.rilspace.org/moar-languagez-gc-content-in-python-d-fpc-c-and-c. This is no match performance wise for the C/D/C++ etc solutions, but I think that it makes a nice point of showing how Scalas parallel collections can be used to with very little extra effort parallelize operations. Test file available for …
import scala.io.Source
import java.io.File
object GCCounter extends App {
val file = new File("Homo_sapiens.GRCh37.67.dna_rm.chromosome.Y.fa")
// The actual GC counting function
def countGCOnLine(line: String): (Long, Long) = {
if (line.startsWith(">"))
@johandahlberg
johandahlberg / plotCoverageProportions.R
Created June 27, 2013 16:35
A R script to plot cumulative coverage based on GATK DepthOfCoverage output files. Not very pretty, but perhaps helpful to somebody.
library(ggplot2)
setwd("<my working directory>")
removeExtraLine <- function(x) {
x[,2:length(x)]
}
createInitialDataset <- function(x, sampleName) {
extraLineRemoved <- t(removeExtraLine(x))