John Blischak jdblischak

Comparing speed for yeast RNA-seq analysis - kallisto vs. Subread

Introduction

[kallisto][] is a new method for processing RNA-seq data. By pseudoaligning reads to a transcriptome instead of aligning reads to a genome, the quantification step is much faster. While the computational speedup will be huge for projects with many samples and/or with organisms with large genomes, I was curious how much time would be saved using [kallisto][] on a small RNA-seq project for an organism with a smaller genome. To perform this comparison, I downloaded 6 fastq files from a recent yeast RNA-seq study on GEO. I chose [Subread][subread] as the comparison method because it performs read alignment but is optimized for quickly obtaining gene counts (it soft clips reads instead of trying to map exact exon-exon boundaries).

Testing Snakemake virtual memory usage

John Blischak 2014-05-14

Multiple users have observed that submitting jobs via Snakemake requires much more memory than is necessary to run the command (e.g. mailing list post, [Bitbucket issue][issue]).

Differential expression analysis with edgeR

This is a tutorial I have presented for the class Genomics and Systems Biology at the University of Chicago. In this course the students learn about study design, normalization, and statistical testing for genomic studies. This is meant to introduce them to how these ideas are implemented in practice. The specific example is a differential expression analysis with edgeR starting with a table of counts and ending with a list of differentially expressed genes.

Past versions:

Tuesday, April 26, 2016

	#!/bin/bash

	# Install local copy of R from source.

	# To run:
	# bash build-r.sh >& log.txt

	# Version of R to install (must be part of the R 3.0 series)
	VERSION=3.3.2
	# Directory to install R. If not already present, creates bin, lib and share

	#!/bin/bash

	for VERSION in 3.3.6 3.4.0 3.4.1 3.4.2 3.4.3
	do
	bash install-python.sh $VERSION
	bash install-pysam.sh $VERSION
	done

	# Renames all files using name of directory.
	# e.g. sample_name_1/s_1_sequence.txt.gz -> sample_name_1_s_1_sequence.txt.gz

	import glob
	import shutil

	filepaths = glob.glob("sample_name_/s__sequence.txt.gz")
	new_names = [x.replace("/", "_") for x in filepaths]
	[shutil.copyfile(x[0], x[1]) for x in zip(filepaths, new_names)]