Skip to content

Instantly share code, notes, and snippets.

@fomightez
Last active June 5, 2018 17:28
Show Gist options
  • Save fomightez/c4377bfd996fea5aba33 to your computer and use it in GitHub Desktop.
Save fomightez/c4377bfd996fea5aba33 to your computer and use it in GitHub Desktop.
Regex to make genus species as one word in fasta entries.

REGEX to make genus species as one word in modified fasta entries.

NOTE: These FASTA entries were first put through my namerv.1.py Python program to put scientific name at start instead of lots of codes that you get back in versions from BATCH ENTREZ.

FIND:

(>\w)\w+ (\w+)

REPLACE:

\1.\2

EXAMPLE:

INPUT:

>Saccharomyces cerevisiae S288c |gi|6323174|ref|NP_013246.1| Rmp1p [Saccharomyces cerevisiae S288c]
MDEMDNVIRSLEQEYRLILLLNHRNKNQHRAASWYGSFNEMKRNCGQIITLFSSRRLQAKRLKDVEWVKL
HRLLQRALFRQLKRWYWQFNGVIALGQFVTLGCTLVTLLANVRALYMRLWEINETEFIRCGCLIKNLPRT
KAKSVVNDVEELGEIIDEDIGNNVQENELVITSIPKPLTENCKKKKKRKKKNKSAIDGIFG
>Schizosaccharomyces pombe 972h- |gi|19115290|ref|NP_594378.1| ribonuclease MRP complex subunit (predicted) [Schizosaccharomyces pombe 972h-]
MQELQYDVVLLQKIVYRNRNQHRLSVWWRHVRMLLRRLKQSLDGNEKAKIAILEQLPKSYFYFTNLIAHG
QYPALGLVLLGILARVWFVMGGIEYEAKIQSEIVFSQKEQKKLELQSQDDIDTGTVVARDELLATEPISL
SINPASTSYEKLTVSSPNSFLKNQDESLFLSSSPITVSQGTKRKSKNSNSTVKKKKKRARKGRDEIDDIF
G
>Ashbya gossypii ATCC 10895 |gi|45200937|ref|NP_986507.1| AGL160Wp [Ashbya gossypii ATCC 10895]
MSDKALRAGEDGTEIRNALRSLQQELRVIHILYHRNKNQHRVATWWKQLNSLKRSVSQVVTVTSKPVRTE
ADLEALAGLLRRFAVRQAPAMYYEFNGVIALGQFVTLGVVLVAALARVWALYGQLREALGLLPVRAAQAE
RECDVAPTEEIGEEVAVAVAASPPGAAALPGGKRIKKKSKSKRSAIDDIFG

OUTPUT:

>S.cerevisiae S288c |gi|6323174|ref|NP_013246.1| Rmp1p [Saccharomyces cerevisiae S288c]
MDEMDNVIRSLEQEYRLILLLNHRNKNQHRAASWYGSFNEMKRNCGQIITLFSSRRLQAKRLKDVEWVKL
HRLLQRALFRQLKRWYWQFNGVIALGQFVTLGCTLVTLLANVRALYMRLWEINETEFIRCGCLIKNLPRT
KAKSVVNDVEELGEIIDEDIGNNVQENELVITSIPKPLTENCKKKKKRKKKNKSAIDGIFG
>S.pombe 972h- |gi|19115290|ref|NP_594378.1| ribonuclease MRP complex subunit (predicted) [Schizosaccharomyces pombe 972h-]
MQELQYDVVLLQKIVYRNRNQHRLSVWWRHVRMLLRRLKQSLDGNEKAKIAILEQLPKSYFYFTNLIAHG
QYPALGLVLLGILARVWFVMGGIEYEAKIQSEIVFSQKEQKKLELQSQDDIDTGTVVARDELLATEPISL
SINPASTSYEKLTVSSPNSFLKNQDESLFLSSSPITVSQGTKRKSKNSNSTVKKKKKRARKGRDEIDDIF
G
>A.gossypii ATCC 10895 |gi|45200937|ref|NP_986507.1| AGL160Wp [Ashbya gossypii ATCC 10895]
MSDKALRAGEDGTEIRNALRSLQQELRVIHILYHRNKNQHRVATWWKQLNSLKRSVSQVVTVTSKPVRTE
ADLEALAGLLRRFAVRQAPAMYYEFNGVIALGQFVTLGVVLVAALARVWALYGQLREALGLLPVRAAQAE
RECDVAPTEEIGEEVAVAVAASPPGAAALPGGKRIKKKSKSKRSAIDDIFG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment