[kallisto][] is a new method for processing RNA-seq data. By pseudoaligning reads to a transcriptome instead of aligning reads to a genome, the quantification step is much faster. While the computational speedup will be huge for projects with many samples and/or with organisms with large genomes, I was curious how much time would be saved using [kallisto][] on a small RNA-seq project for an organism with a smaller genome. To perform this comparison, I downloaded 6 fastq files from a recent yeast RNA-seq study on GEO. I chose [Subread][subread] as the comparison method because it performs read alignment but is optimized for quickly obtaining gene counts (it soft clips reads instead of trying to map exact exon-exon boundaries).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# Install local copy of R from source. | |
# To run: | |
# bash build-r.sh >& log.txt | |
# Version of R to install (must be part of the R 3.0 series) | |
VERSION=3.3.2 | |
# Directory to install R. If not already present, creates bin, lib and share |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
for VERSION in 3.3.6 3.4.0 3.4.1 3.4.2 3.4.3 | |
do | |
bash install-python.sh $VERSION | |
bash install-pysam.sh $VERSION | |
done |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Renames all files using name of directory. | |
# e.g. sample_name_1/s_1_sequence.txt.gz -> sample_name_1_s_1_sequence.txt.gz | |
import glob | |
import shutil | |
filepaths = glob.glob("sample_name_*/s_*_sequence.txt.gz") | |
new_names = [x.replace("/", "_") for x in filepaths] | |
[shutil.copyfile(x[0], x[1]) for x in zip(filepaths, new_names)] |
John Blischak 2014-05-14
Multiple users have observed that submitting jobs via Snakemake requires much more memory than is necessary to run the command (e.g. mailing list post, [Bitbucket issue][issue]).
This is a tutorial I have presented for the class Genomics and Systems Biology at the University of Chicago. In this course the students learn about study design, normalization, and statistical testing for genomic studies. This is meant to introduce them to how these ideas are implemented in practice. The specific example is a differential expression analysis with edgeR starting with a table of counts and ending with a list of differentially expressed genes.
Past versions:
NewerOlder