Skip to content

Instantly share code, notes, and snippets.

View standage's full-sized avatar

Daniel Standage standage

View GitHub Profile
@standage
standage / tdc-diff.py
Last active August 29, 2015 13:59
Given TransDecoder output from two samples, identify instances where CDS starts and stops match but translation is different.
#!/usr/bin/env python
import getopt, os, re, sys
class Seq:
def __init__(self, defline, seq):
self.defline = defline
self.seq = seq
match = re.match("^>(\S+)", defline)
self.id = match.group(1)
@standage
standage / tdc-driver.py
Last active August 29, 2015 14:00
Minimal Python script for running TransDecoder on a Cufflinks assembly.
#!/usr/bin/env python
import getopt, os, subprocess, sys
def print_usage(outstream):
usage = ("Usage: tdc-driver [options] cufflinks.gtf refseq.fasta\n"
" Options:\n"
" -h|--help: print this help message and exit\n"
" -o|--out-dir: PATH output directory; default is 'tdc.$pid'\n"
" -T|--tdc-dir: PATH TransDecoder directory; default is '/usr/local/src/transdecoder'\n")
print >> outstream, usage
@standage
standage / tdc-extract.py
Last active August 29, 2015 14:00
Given output of tdc-diff.py (https://gist.github.com/standage/10765999), extract sequences warranting further exploration.
#!/usr/bin/env python
import getopt, os, re, sys
class Seq:
def __init__(self, defline, seq):
self.defline = defline
self.seq = seq
match = re.match("^>(\S+)", defline)
self.id = match.group(1)
@standage
standage / vrlpr.py
Last active August 29, 2015 14:00
Find overlapping genes in a GTF file
#!/usr/bin/env python
import re, sys
# vrlpr.py: find overlapping genes in a GTF file
# Usage: python vrlpr.py < genes.gtf > overlaps.txt
def overlap(range1, range2):
return range1[0] == range2[0] and range1[2] >= range2[1] and \
range1[1] <= range2[2]
==> Downloading http://genometools.org/pub/genometools-1.5.2.tar.gz
Already downloaded: /Library/Caches/Homebrew/genometools-1.5.2.tar.gz
==> Verifying genometools-1.5.2.tar.gz checksum
tar xf /Library/Caches/Homebrew/genometools-1.5.2.tar.gz
==> make prefix=/usr/local/Cellar/genometools/1.5.2 64bit=yes
[create obj/gt_config.h]
[compile sqlite3.o]
[compile alphabet.o]
[compile array.o]
[compile array2dim.o]
@standage
standage / skexon.c
Last active August 29, 2015 14:01
Simulate exon skipping events
/*
Copyright (c) 2014, Daniel S. Standage <daniel.standage@gmail.com>
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
@standage
standage / fq-strip-contam.sh
Last active November 26, 2019 17:55
Procedure for removing contaminants from paired-end sequence data. The bwa-mem algorithm is used to map reads against a database of contaminants, a small Perl one-liner is used to filter out reads that map to the contaminants, the SAM data is converted to BAM format, which is then fed (via process substitution) to Tophat's bam2fastx to convert b…
# -q: output in Fastq format
# -Q: ignore BAM quality flags
# -P: paired-end data
bam2fastx -qQP -o clean.fq <(bwa mem contaminants.fasta reads.1.fq reads.2.fq | \
perl -ne '@v = split(/\t/); print if(m/^@/ or ($v[1] & 4 and $v[1] & 8))' | \
samtools view -bhS -)
#!/usr/bin/env bash
for file in `ls -1 *.results`
do
base_name=`basename $file .results`
sed -i.bak -e "1 s/^/$base_name./" -e $"1 s/\t/\t$base_name./g" $file
done
@standage
standage / maker-fix-viga.sh
Created May 21, 2014 19:09
Maker input included a set of precomputed annotations. For some reason, a handful of these had mangled coordinates in the final Maker output. This little procedure was used to fix that problem.
perl -ne 'next unless(/^PdomSCFr1.1/); @v = split(/\t/); if($v[3] eq "" or $v[4] eq ""){ $two = <STDIN>; @v2 = split(/\t/, $two); $v[3] = $v2[3]; $v[4] = $v2[4]; print join("\t", @v), $two; } else{ print }' \
< pdom-annot-p1.2-prealign.gff3 \
> pdom-annot-p1.2-prealign-fixed.gff3
@standage
standage / annot-ids.py
Created May 21, 2014 19:32
Script used in procedure for minting "final" IDs for Maker annotations.
import re, sys
if __name__ == "__main__":
"""
Input: GFF3 file containing gene (mRNA and tRNA) annotations
Output: text files with new IDs for genes (written to stdout) and RNAs (written to stderr)
"""
prefix = sys.argv[1]
release = sys.argv[2]
genecount = 0