Skip to content

Instantly share code, notes, and snippets.

@fomightez
Last active August 29, 2015 14:17
Show Gist options
  • Save fomightez/2e31e3e7afcd54d18229 to your computer and use it in GitHub Desktop.
Save fomightez/2e31e3e7afcd54d18229 to your computer and use it in GitHub Desktop.
regular expression to replace description lines in fasta from SGD with simple 'chr' followed by number or mt

REGEX for replacing SGD fasta description line with chromosome number

recreating steps probably used in process described in ChIP-Seq example at NUCwave site

S. cerevisiae reference genome was downloaded from SGD and FASTA headers for chromosome names were replaced with chrI-chrXVI.

FIND:

>.*chromosome=(\w+)\]

REPLACE:

>chr\1

ALSO TRY WITH $ at right side end. Sublime Text matches with it but other flavors of Regular Expressions, such as at Regular Expressions 101, didn't like this. (Also need g global modifier on at Regular Expressions 101 to see as in Sublime Text.) I think Regular Expressions 101 $ seems to regard that as the end if the string and not the end of the line like Sublime Text does.

Text after chromsome= is what is being captured and used in the Replace.

They don't mention in the description but they changed the mitochondrion description to be very succinct as well. Will have to do mitochondria separately.

FIND:

>.*mitochondrion.*

REPLACE:

>chrmt

##EXAMPLE: ###INPUT:

>ref|NC_001133| [org=Saccharomyces cerevisiae] [strain=S288C] [moltype=genomic] [chromosome=I]
CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACA
CATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTGGCCAACCTGTCTCTCAACTT
ACCCTCCATTACCC.....
>ref|NC_001134| [org=Saccharomyces cerevisiae] [strain=S288C] [moltype=genomic] [chromosome=II]
AAATAGCCCTCATGTACGTCTCCTCCAAGCCCTGTTGTCTCTTACCCGGATGTTCAACCA
AAAGCTACTTACTACCTTTATTTTATGTTTACTTTTTATAGGTTGTCTTTTTATCCCACT
TCTTCGCACTTGTCTCTCGCTACTGCCGTGCAACAAACACTAAATCAAAACAATGAAATA
CTACTACATCAAAACGCATTTTCCCTAGAAAAAAAATTTTCTTACAATATACTATACTAC
ACAATACATAATCACTGACTTTCGTAACAACAATTTCCTTCACTCTCCAACTTCTCTGCT
CGAATCTCTACATAGTAATATTATATCAAATCTACCGTCTGGAACATCATC...
>ref|NC_001224| [org=Saccharomyces cerevisiae] [strain=S288C] [moltype=genomic] [location=mitochondrion] [top=circular]
TTCATAATTAATTTTTTATATATATATTATATTATAATATTAATTTATATTATAAAAATA
ATATTTATTATTAAAATATTTATTCTCCTTTCGGGGTTCCGGCTCCCGTGGCCGGGCCCC
GGAATTATTAATTAATAATAAATTATTATTAATAATTATTTATTATTTTATCATTAAAAT
ATATAAATAAAAAATATTAAAAAGATAAAAAAAATAATGTTTATTCTTTATATAAATTAT
ATATATATATATAATTAATTAATTAATTAATTAATTAATAATA...

###FINAL OUTPUT:

>chrI
CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACA
CATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTGGCCAACCTGTCTCTCAACTT
ACCCTCCATTACCC.....
>chrII
AAATAGCCCTCATGTACGTCTCCTCCAAGCCCTGTTGTCTCTTACCCGGATGTTCAACCA
AAAGCTACTTACTACCTTTATTTTATGTTTACTTTTTATAGGTTGTCTTTTTATCCCACT
TCTTCGCACTTGTCTCTCGCTACTGCCGTGCAACAAACACTAAATCAAAACAATGAAATA
CTACTACATCAAAACGCATTTTCCCTAGAAAAAAAATTTTCTTACAATATACTATACTAC
ACAATACATAATCACTGACTTTCGTAACAACAATTTCCTTCACTCTCCAACTTCTCTGCT
CGAATCTCTACATAGTAATATTATATCAAATCTACCGTCTGGAACATCATC...
>chrmt
TTCATAATTAATTTTTTATATATATATTATATTATAATATTAATTTATATTATAAAAATA
ATATTTATTATTAAAATATTTATTCTCCTTTCGGGGTTCCGGCTCCCGTGGCCGGGCCCC
GGAATTATTAATTAATAATAAATTATTATTAATAATTATTTATTATTTTATCATTAAAAT
ATATAAATAAAAAATATTAAAAAGATAAAAAAAATAATGTTTATTCTTTATATAAATTAT
ATATATATATATAATTAATTAATTAATTAATTAATTAATAATA...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment