Created
March 20, 2011 15:47
-
-
Save arq5x/878394 to your computer and use it in GitHub Desktop.
Use filo's "groupBy" to create cDNA of transcripts.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
################################################################### | |
# Assume we have a file of BED exons for every gene and transcript. | |
# The exons are listed in genomic order for each gene/transcipt | |
################################################################### | |
$ head -n 5 exons.bed | |
chr1 1337462 1337636 MRPL20 exon1 - | |
chr1 1340996 1341266 MRPL20 exon2 - | |
chr1 1341188 1341266 MRPL20 exon3 - | |
chr1 1342288 1342399 MRPL20 exon4 - | |
chr1 1342510 1342597 MRPL20 exon5 - | |
####################################################################### | |
# We can use BEDTools' fastaFromBed to grab the relevant sequence | |
# for each exon and place the sequence on a single, tab separated line. | |
# | |
# I've truncated the sequences for email clarity... | |
####################################################################### | |
$ fastaFromBed -tab -name -fi hg19.fa -bed exons.bed -fo exons.tab.fa -s | |
MRPL20 TGCCAGGTGGAGCTCAACAGGAAAGTCCTAGCGGATCTGGCCATCTACGAGCC... | |
MRPL20 CTCTGGATTAATCGAATTACAGCTGCTAGCCAGGAACATGGACTGAAGTATCCA... | |
MRPL20 CTCTGGATTAATCGAATTACAGCTGCTAGCCAGGAACATGGACTGAAGTATCCA... | |
MRPL20 CACTTCCGGGGAAGGAAAAATCGCTGCTACAGGTTGGCGGTCAGAACCGTGAT... | |
MRPL20 ATGGTCTTCCTCACCGCGCAGCTCTGGCTGCGGAATCGCGTCACCGACCGCTA... | |
####################################################################### | |
# Now we use filo's groupBy to concatenate the exon sequences into a | |
# single transcript. | |
####################################################################### | |
# In this case, | |
$ groupBy -i exons.tab.fa -g 1 -o concat -c 2 | |
MRPL20 TGCCAGGTGGAGCTCAACAGGAAAGTCCTAGCGGATCTGGCCATCTACGAGCCTCTGGATTAATCGAATTACAGCTGCTAGCCAGGAACATGGACTGAAGTATCCACTCTGGATTAATCGAATTA... | |
####################################################################### | |
# We can do all of this in a single command. | |
####################################################################### | |
$ fastaFromBed -tab -name -s -fi hg19.fa -bed exons.bed -fo stdout \ | |
groupBy -i exons.tab.fa -g 1 -o concat -c 2 | |
MRPL20 TGCCAGGTGGAGCTCAACAGGAAAGTCCTAGCGGATCTGGCCATCTACGAGCCTCTGGATTAATCGAATTACAGCTGCTAGCCAGGAACATGGACTGAAGTATCCACTCTGGATTAATCGAATTA... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment