Skip to content

Instantly share code, notes, and snippets.

View standage's full-sized avatar

Daniel Standage standage

View GitHub Profile
@standage
standage / gt-bug-138.c
Created September 26, 2013 04:22
Simple GenomeTools program. Compiles fine, dies with segfault at runtime. Running under valgrind, it runs to completion but reports a lot of invalid access errors.
#define WITHOUT_CAIRO
#include "genometools.h"
GtFeatureNode *eden()
{
GtGenomeNode *gene, *mrna1, *mrna2, *mrna3, *feature;
GtFeatureNode *genefn, *mrna1fn, *mrna2fn, *mrna3fn, *featurefn;
GtGenomeNode *f1, *f2, *f3, *f4;
GtFeatureNode *fn1, *fn2, *fn3, *fn4;
GtStr *seqid = gt_str_new_cstr("ctg123");
#!/usr/bin/env perl
use strict;
use warnings;
# Daniel S. Standage <daniel.standage@gmail.com>
# Input: sequences cleaned up and annotated by mRNAmarkup
# Output: each sequence ID along with its annotation, if any
my $usage = "perl $0 < cln-seqs.fas > seqids-and-funcs.txt";
while(<STDIN>)
#!/usr/bin/env perl
use strict;
my %mapping;
while(<STDIN>)
{
print;
chomp;
my($tsaid, @vals) = split(/,/);
$mapping{ $tsaid } = 1;
#!/usr/bin/env perl
use strict;
my %mapping;
while(<STDIN>)
{
print;
chomp;
my($tsaid, @vals) = split(/\t/);
$mapping{ $tsaid } = 1;
@standage
standage / seq-regions.pl
Created November 6, 2013 15:19
Given a Fasta file, print out GFF3-compatible '##sequence-region' pragmas. Often when GFF3 files are produced, the sequence-region pragmas indicate the coordinates directly bounding and sequence annotations rather than the entire sequence length. In some cases, having the entire sequence length is useful, and this script facilitates this.
#!/usr/bin/env perl
use strict;
use Bio::SeqIO;
my $instream = Bio::SeqIO->new( -fh => \*STDIN, -format => "Fasta" );
while(my $seq = $instream->next_seq)
{
printf("##sequence-region %s %lu %lu\n", $seq->id, 1, $seq->length);
}
@standage
standage / selex
Created November 19, 2013 21:13
This script is motivated by a task that commonly comes up in my day-to-day research--a task that more often than not I code from scratch each time. The details vary, but the basics are the same. I have a table or mapping of important data, and I want to use a list of identifiers to pull out a subset of entries from that table or mapping. I've wr…
#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Long;
# Copyright (c) 2013, Daniel S. Standage <daniel.standage@gmail.com>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
@standage
standage / streamtest-data.gff3
Created January 30, 2014 19:39
Working through issues with GenomeTools feature streams and memory mangement.
##gff-version 3
##sequence-region seq01 1 900
##sequence-region seq02 1 900
##sequence-region seq03 1 900
##sequence-region seq04 1 900
##sequence-region seq05 1 900
##sequence-region seq06 1 900
##sequence-region seq07 1 2000
##sequence-region seq08 1 2000
##sequence-region seq09 1 2000
PdomSCFr1.1-0001 Cufflinks transcript 2872997 2880927 500 . . gene_id "CUFF.181"; transcript_id "CUFF.181.1"; FPKM "0.5000000000"; frac "0.000000"; conf_lo "0.500000"; conf_hi "0.500000"; cov "0.225344";
PdomSCFr1.1-0001 Cufflinks exon 2872997 2880927 500 . . gene_id "CUFF.181"; transcript_id "CUFF.181.1"; exon_number "1"; FPKM "0.5000000000"; frac "0.000000"; conf_lo "0.500000"; conf_hi "0.500000"; cov "0.225344";
PdomSCFr1.1-0001 Cufflinks transcript 2873500 2875236 1000 . . gene_id "CUFF.181"; transcript_id "CUFF.181.2"; FPKM "1.0000000000"; frac "0.000000"; conf_lo "1.000000"; conf_hi "1.000000"; cov "0.450687";
PdomSCFr1.1-0001 Cufflinks exon 2873500 2875236 1000 . . gene_id "CUFF.181"; transcript_id "CUFF.181.2"; exon_number "1"; FPKM "1.0000000000"; frac "0.000000"; conf_lo "1.000000"; conf_hi "1.000000"; cov "0.450687";
PdomSCFr1.1-0001 Cufflinks transcript 2875367 2880803 1000 . . gene_id "CUFF.181"; transcript_id "CUFF.181.3"; FPKM "1.0000000000"; frac "0.000000"; conf_lo "1.000000"; conf_hi "1.0000
[create obj/gt_config.h]
[compile sqlite3.o]
[compile alphabet.o]
[compile array.o]
[compile array2dim.o]
[compile array2dim_sparse.o]
[compile array3dim.o]
[compile basename.o]
[compile bioseq.o]
[compile bioseq_col.o]
@standage
standage / gsq2gff3.py
Created April 11, 2014 16:59
Minimal GeneSeqer to GFF3 converter
#!/usr/bin/env python
import re, sys
# Usage: gsq2gff3 < in.gsq > out.gff3
print "##gff-version 3"
for line in sys.stdin:
line = line.rstrip()
matches = re.search("hqPGS_(.+)[+-]_(.+)([+-])\s\((.+)\)", line)
if not matches: