Skip to content

Instantly share code, notes, and snippets.

@fomightez
Last active August 2, 2019 20:17
Show Gist options
  • Save fomightez/f46b0624f1d8e3abb6ff908fc447e63b to your computer and use it in GitHub Desktop.
Save fomightez/f46b0624f1d8e3abb6ff908fc447e63b to your computer and use it in GitHub Desktop.
Stv1p vs Vph1p MUSCLE alignment
CLUSTAL multiple sequence alignment by MUSCLE (3.8)
STV1 -MNQEEAIFRSADMTYVQLYIPLEVIREVTFLLGKMSVFMVMDLNKDLTAFQRGYVNQLR
VPH1 MAEKEEAIFRSAEMALVQFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIR
::********:*: **:*** *: *: :: **::.:. . ***..: **** :**::*
STV1 RFDEVERMVGFLNEVVEKHAAETW-----KYILHIDDEGNDIAQPDMADLINTMEPLSLE
VPH1 RLDNVERQYRYFYSLLKKHDIKLYEGDTDKYL----DGSGELYVPPSGSVI---------
*:*:*** :: .:::** : : **: * ..:: * ..:*
STV1 NVNDMVKEITDCESRARQLDESLDSLRSKLNDLLEQRQVIFECSKFIEVNPGIAGRATNP
VPH1 --DDYVRNASYLEERLIQMEDATDQIEVQKNDLEQYRFILQSGDEFF-----LKGDNTDS
:* *.: : *.* *:::: *.: : *** : * :: . .:*: : * *:.
STV1 EIEQEERDVDEFRMTPDDISETLSDAFSFDDETPQDRGALGNDLTRNQSVEDLSFLEQGY
VPH1 TSYMDEDMIDA---NGENIAAAIGASVNY-------------------------------
:* :* . ::*: ::. :..:
STV1 QHRYMITGSIRRTKVDILNRILWRLLRGNLIFQNFPIEEPL--LEGKEKVEKDCFIIFTH
VPH1 -----VTGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFSH
:** * * ** *:.****:*****:*:.. **:*: :: .* *:.**:*:*
STV1 GETLLKKVKRVIDSLNGKIVSLNTRS---SELVDTLNRQIDDLQRILDTTEQTLHTELLV
VPH1 GDLIIKRIRKIAESLDANLYDVDSSNEGRSQQLAKVNKNLSDLYTVLKTTSTTLESELYA
*: ::*.:..: :**:.:: .::: . *: : .:*.::.** :*.**. ** :** .
STV1 IHDQLPVWSAMTKREKYVYTTLNK--FQQESQGLIAEGWVPSTELIHLQDSLKDYIETLG
VPH1 IAKELDSWFQDVTREKAIFEILNKSNYDTNRKILIAEGWIPRDELATLQARLGEMIARLG
* .:* * ..*** :: *** :: : : ******:* ** ** * : * **
STV1 SEYSTVFNVILTNKLPPTYHRTNKFTQAFQSIVDAYGIATYKEINAGLATVVTFPFMFAI
VPH1 IDVPSIIQVLDTNHTPPTFHRTNKFTAGFQSICDCYGIAQYREINAGLPTIVTFPFMFAI
: .::::*: **: ***:******* .**** *.**** *.******.*:*********
STV1 MFGDMGHGFILFLMALFLVLNERKFGAMHRDEIFDMAFTGRYVLLLMGAFSVYTGLLYND
VPH1 MFGDMGHGFLMTLAALSLVLNEKKINKMKRGEIFDMAFTGRYIILLMGVFSMYTGFLYND
*********:: * ** *****.*:. *:*.***********::****.**:***:****
STV1 IFSKSMTIFKSGWQWPSTFRKGESIEAKKTGVYPFGLDFAWHGTDNGLLFSNSYKMKLSI
VPH1 IFSKTMTIFKSGWKWPDHWKKGESITATSVGTYPIGLDWAWHGTENALLFSNSYKMKLSI
****:********:**. :.***** *...*.**:***:*****:*.*************
STV1 LMGYAHMTYSFMFSYINYRAKNSKVDIIGNFIPGLVFMQSIFGYLSWAIVYKWSKDWIKD
VPH1 LMGFIHMTYSYFFSLANHLYFNSMIDIIGNFIPGLLFMQGIFGYLSVCIVYKWAVDWVKD
***: *****::** *: ** :**********:***.****** .*****: **:**
STV1 DKPAPGLLNMLINMFLAPGTIDDQLYSGQAKLQVVLLLAALVCVPWLLLYKPLTLRRLNK
VPH1 GKPAPGLLNMLINMFLSPGTIDDELYPHQAKVQVFLLLMALVCIPWLLLVKPLHFKFTHK
.***************:******:**. ***:**.*** ****:***** *** :. :*
STV1 NGGGGRPHGYQSVGNIEHEEQIAQQRHSAEGFQGMIISDVASVADSINESVGGGEQGPFN
VPH1 K------KSHEPLPSTEADA-------SSEDLEAQQLISAMDADDAEEEEVGSGSHGE-D
: :.::.: . * : *:*.::. : .. .. *: :*.**.*.:* :
STV1 FGDVMIHQVIHTIEFCLNCISHTASYLRLWALSLAHAQLSSVLWDMTISNAFSSKNSGSP
VPH1 FGDIMIHQVIHTIEFCLNCVSHTASYLRLWALSLAHAQLSSVLWTMTIQIAFGFRGF---
***:***************:************************ ***. **. ..
STV1 LAVMKVVFLFAMWFVLTVCILVFMEGTSAMLHALRLHWVEAMSKFFEGEGYAYEPFSFR-
VPH1 VGVFMTVALFAMWFALTCAVLVLMEGTSAMLHSLRLHWVESMSKFFVGEGLPYEPFAFEY
:.*: .* ******.** .:**:*********:*******:***** *** .****:*
STV1 -----AIIE-------
VPH1 KDMEVAVASASSSASS
*: .

When I was comparing the results direct from EMBL-EBI's MUSCLE to what my script to add the consensus symbols produces I noticed some discrepancies.

The third line shows that the consensus symbols for MUSCLE alignment are differently defined than other place I have seen:

STV1            NVNDMVKEITDCESRARQLDESLDSLRSKLNDLLEQRQVIFECSKFIEVNPGIAGRATNP
VPH1            --DDYVRNASYLEERLIQMEDATDQIEVQKNDLEQYRFILQSGDEFF-----LKGDNTDS
                  :* *.: :  *.*  *:::: *.:  : *** : * :: . .:*:     : *  *:.
                      ^                   ^            

I put an upward arrow head (super-script character) pointing out the two that don't match what I have seen elsewhere for conserved residues.

  • Why is K and R substituion not strongly similar? Should be according to here and here and here.
  • Why is E and R substituion not weakly similar? Should be according to here and here and here.

My script calculate_cons_for_clustal_protein.py annotates these correctly. I only noticed when I was trying to test calculate_cons_for_clustal_protein.py and noticed I wasn't producing things matching perfect with the symbols that MUSCLE adds.

CLUSTAL multiple sequence alignment by MUSCLE (3.8)
STV1 -MNQEEAIFRSADMTYVQLYIPLEVIREVTFLLGKMSVFMVMDLNKDLTAFQRGYVNQLRRFDEVER
VPH1 MAEKEEAIFRSAEMALVQFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIRRLDNVER
::********:*: **:*** *: *: :: **::.:. . ***..: **** :**::**:*:***
STV1 MVGFLNEVVEKHAAETW-----KYILHIDDEGNDIAQPDMADLINTMEPLSLE
VPH1 QYRYFYSLLKKHDIKLYEGDTDKYL----DGSGELYVPPSGSVI---------
:: .:::** : : **: * ..:: * ..:*
STV1 NVNDMVKEITDCESRARQLDESLDSLRSKLNDLLEQRQVIFECSKFIEVNPGIAGRATNP
VPH1 --DDYVRNASYLEERLIQMEDATDQIEVQKNDLEQYRFILQSGDEFF-----LKGDNTDS
:* *.: : *.* *:::: *.: : *** : * :: . .:*: : * *:.
STV1 EIEQEERDVDEFRMTPDDISETLSDAFSFDDETPQDRGALGNDLTRNQSVEDLSFLEQGY
VPH1 TSYMDEDMIDA---NGENIAAAIGASVNY-------------------------------
:* :* . ::*: ::. :..:
STV1 QHRYMITGSIRRTKVDILNRILWRLLRGNLIFQNFPIEEPL--LEGKEKVEKDCFIIFTH
VPH1 -----VTGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFSH
:** * * ** *:.****:*****:*:.. **:*: :: .* *:.**:*:*
STV1 GETLLKKVKRVIDSLNGKIVSLNTRS---SELVDTLNRQIDDLQRILDTTEQTLHTELLV
VPH1 GDLIIKRIRKIAESLDANLYDVDSSNEGRSQQLAKVNKNLSDLYTVLKTTSTTLESELYA
*: ::*.:..: :**:.:: .::: . *: : .:*.::.** :*.**. ** :** .
STV1 IHDQLPVWSAMTKREKYVYTTLNK--FQQESQGLIAEGWVPSTELIHLQDSLKDYIETLG
VPH1 IAKELDSWFQDVTREKAIFEILNKSNYDTNRKILIAEGWIPRDELATLQARLGEMIARLG
* .:* * ..*** :: *** :: : : ******:* ** ** * : * **
STV1 SEYSTVFNVILTNKLPPTYHRTNKFTQAFQSIVDAYGIATYKEINAGLATVVTFPFMFAI
VPH1 IDVPSIIQVLDTNHTPPTFHRTNKFTAGFQSICDCYGIAQYREINAGLPTIVTFPFMFAI
: .::::*: **: ***:******* .**** *.**** *.******.*:*********
STV1 MFGDMGHGFILFLMALFLVLNERKFGAMHRDEIFDMAFTGRYVLLLMGAFSVYTGLLYND
VPH1 MFGDMGHGFLMTLAALSLVLNEKKINKMKRGEIFDMAFTGRYIILLMGVFSMYTGFLYND
*********:: * ** *****.*:. *:*.***********::****.**:***:****
STV1 IFSKSMTIFKSGWQWPSTFRKGESIEAKKTGVYPFGLDFAWHGTDNGLLFSNSYKMKLSI
VPH1 IFSKTMTIFKSGWKWPDHWKKGESITATSVGTYPIGLDWAWHGTENALLFSNSYKMKLSI
****:********:**. :.***** *...*.**:***:*****:*.*************
STV1 LMGYAHMTYSFMFSYINYRAKNSKVDIIGNFIPGLVFMQSIFGYLSWAIVYKWSKDWIKD
VPH1 LMGFIHMTYSYFFSLANHLYFNSMIDIIGNFIPGLLFMQGIFGYLSVCIVYKWAVDWVKD
***: *****::** *: ** :**********:***.****** .*****: **:**
STV1 DKPAPGLLNMLINMFLAPGTIDDQLYSGQAKLQVVLLLAALVCVPWLLLYKPLTLRRLNK
VPH1 GKPAPGLLNMLINMFLSPGTIDDELYPHQAKVQVFLLLMALVCIPWLLLVKPLHFKFTHK
.***************:******:**. ***:**.*** ****:***** *** :. :*
STV1 NGGGGRPHGYQSVGNIEHEEQIAQQRHSAEGFQGMIISDVASVADSINESVGGGEQGPFN
VPH1 K------KSHEPLPSTEADA-------SSEDLEAQQLISAMDADDAEEEEVGSGSHGE-D
: :.::.: . * : *:*.::. : .. .. *: :*.**.*.:* :
STV1 FGDVMIHQVIHTIEFCLNCISHTASYLRLWALSLAHAQLSSVLWDMTISNAFSSKNSGSP
VPH1 FGDIMIHQVIHTIEFCLNCVSHTASYLRLWALSLAHAQLSSVLWTMTIQIAFGFRGF---
***:***************:************************ ***. **. ..
STV1 LAVMKVVFLFAMWFVLTVCILVFMEGTSAMLHALRLHWVEAMSKFFEGEGYAYEPFSFR-
VPH1 VGVFMTVALFAMWFALTCAVLVLMEGTSAMLHSLRLHWVESMSKFFVGEGLPYEPFAFEY
:.*: .* ******.** .:**:*********:*******:***** *** .****:*
STV1 -----AIIE-------
VPH1 KDMEVAVASASSSASS
*: .
Reference sequence (1): STV1 Identities normalised by aligned length.
1 STV1 100.0% 100.0% -MNQEEAIFRSADMTYVQLYIPLEVIREVTFLLGKMSVFMVMDLNKDLTAFQRGYVNQLR
2 VPH1 91.5% 47.2% MAEKEEAIFRSAEMALVQFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIR
1 STV1 100.0% 100.0% RFDEVERMVGFLNEVVEKHAAETW-----KYILHIDDEGNDIAQPDMADLINTMEPLSLE
2 VPH1 91.5% 47.2% RLDNVERQYRYFYSLLKKHDIKLYEGDTDKYL----DGSGELYVPPSGSVI---------
1 STV1 100.0% 100.0% NVNDMVKEITDCESRARQLDESLDSLRSKLNDLLEQRQVIFECSKFIEVNPGIAGRATNP
2 VPH1 91.5% 47.2% --DDYVRNASYLEERLIQMEDATDQIEVQKNDLEQYRFILQSGDEFF-----LKGDNTDS
1 STV1 100.0% 100.0% EIEQEERDVDEFRMTPDDISETLSDAFSFDDETPQDRGALGNDLTRNQSVEDLSFLEQGY
2 VPH1 91.5% 47.2% TSYMDEDMIDA---NGENIAAAIGASVNY-------------------------------
1 STV1 100.0% 100.0% QHRYMITGSIRRTKVDILNRILWRLLRGNLIFQNFPIEEPL--LEGKEKVEKDCFIIFTH
2 VPH1 91.5% 47.2% -----VTGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFSH
1 STV1 100.0% 100.0% GETLLKKVKRVIDSLNGKIVSLNTRS---SELVDTLNRQIDDLQRILDTTEQTLHTELLV
2 VPH1 91.5% 47.2% GDLIIKRIRKIAESLDANLYDVDSSNEGRSQQLAKVNKNLSDLYTVLKTTSTTLESELYA
1 STV1 100.0% 100.0% IHDQLPVWSAMTKREKYVYTTLNK--FQQESQGLIAEGWVPSTELIHLQDSLKDYIETLG
2 VPH1 91.5% 47.2% IAKELDSWFQDVTREKAIFEILNKSNYDTNRKILIAEGWIPRDELATLQARLGEMIARLG
1 STV1 100.0% 100.0% SEYSTVFNVILTNKLPPTYHRTNKFTQAFQSIVDAYGIATYKEINAGLATVVTFPFMFAI
2 VPH1 91.5% 47.2% IDVPSIIQVLDTNHTPPTFHRTNKFTAGFQSICDCYGIAQYREINAGLPTIVTFPFMFAI
1 STV1 100.0% 100.0% MFGDMGHGFILFLMALFLVLNERKFGAMHRDEIFDMAFTGRYVLLLMGAFSVYTGLLYND
2 VPH1 91.5% 47.2% MFGDMGHGFLMTLAALSLVLNEKKINKMKRGEIFDMAFTGRYIILLMGVFSMYTGFLYND
1 STV1 100.0% 100.0% IFSKSMTIFKSGWQWPSTFRKGESIEAKKTGVYPFGLDFAWHGTDNGLLFSNSYKMKLSI
2 VPH1 91.5% 47.2% IFSKTMTIFKSGWKWPDHWKKGESITATSVGTYPIGLDWAWHGTENALLFSNSYKMKLSI
1 STV1 100.0% 100.0% LMGYAHMTYSFMFSYINYRAKNSKVDIIGNFIPGLVFMQSIFGYLSWAIVYKWSKDWIKD
2 VPH1 91.5% 47.2% LMGFIHMTYSYFFSLANHLYFNSMIDIIGNFIPGLLFMQGIFGYLSVCIVYKWAVDWVKD
1 STV1 100.0% 100.0% DKPAPGLLNMLINMFLAPGTIDDQLYSGQAKLQVVLLLAALVCVPWLLLYKPLTLRRLNK
2 VPH1 91.5% 47.2% GKPAPGLLNMLINMFLSPGTIDDELYPHQAKVQVFLLLMALVCIPWLLLVKPLHFKFTHK
1 STV1 100.0% 100.0% NGGGGRPHGYQSVGNIEHEEQIAQQRHSAEGFQGMIISDVASVADSINESVGGGEQGPFN
2 VPH1 91.5% 47.2% K------KSHEPLPSTEADA-------SSEDLEAQQLISAMDADDAEEEEVGSGSHGE-D
1 STV1 100.0% 100.0% FGDVMIHQVIHTIEFCLNCISHTASYLRLWALSLAHAQLSSVLWDMTISNAFSSKNSGSP
2 VPH1 91.5% 47.2% FGDIMIHQVIHTIEFCLNCVSHTASYLRLWALSLAHAQLSSVLWTMTIQIAFGFRGF---
1 STV1 100.0% 100.0% LAVMKVVFLFAMWFVLTVCILVFMEGTSAMLHALRLHWVEAMSKFFEGEGYAYEPFSFR-
2 VPH1 91.5% 47.2% VGVFMTVALFAMWFALTCAVLVLMEGTSAMLHSLRLHWVESMSKFFVGEGLPYEPFAFEY
1 STV1 100.0% 100.0% -----AIIE-------
2 VPH1 91.5% 47.2% KDMEVAVASASSSASS
MView 1.63, Copyright (C) 1997-2018 Nigel P. Brown
CLUSTAL multiple sequence alignment by MUSCLE (3.8)
BAH13127.1 FRSEEMTLA--QLFLQSEAAYCCVSELGELGKVQFRDLNPDVNVFQRKFVNEVRRCEEMD
EAW98433.1 MTATEMRCVGRSFYIHG---------------LSIIKLNQNVSSFQRKFVGEVKRCEELE
VPH1 FRSAEMALV--QFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIRRLDNVE
STV1 FRSADMTYV--QLYIPLEVIREVTFLLGKMSVFMVMDLNKDLTAFQRGYVNQLRRFDEVE
: : :* . .::: . . .** .: *** :*.::.* ::::
BAH13127.1 RKLRFVEKEIRKANIPIM------------DTGENPEVPFPRDMI---------------
EAW98433.1 RILVYLVQEINRADIPLP------------EGEASPPAPPLKQVL------------EMQ
VPH1 RQYRYFYSLLKKHDIKLYEGDTDKYL----DGSGELYVPPSGSVI-----------DDYV
STV1 RMVGFLNEVVEKHAAETW-----KYILHIDDEGNDIAQPDMADLINTMEPLSLENVNDMV
* :. . : . : . * .::
BAH13127.1 ------------------DLEMADPDLLE----------------------------ESS
EAW98433.1 EQLQKLEVELREVTKNKEKLRKNLLELIEYTHMLRVTKTFVKRNVEFEPTYEEFPSLESD
VPH1 RNASYLEERLIQMEDATDQIEVQKNDLEQYRFILQSGDEFF-----LKGDNTDSTSYMDE
STV1 KEITDCESRARQLDESLDSLRSKLNDLLEQRQVIFECSKFIEVNPGIAGRATNPEIEQEE
.: :* : ..
BAH13127.1 SLLE-----PSEMGRGTPLRLGF------------------------------------V
EAW98433.1 SLLD-----YSCMQR-LGAKLGF------------------------------------V
VPH1 DMIDA---NGENIAAAIGASVNY------------------------------------V
STV1 RDVDEFRMTPDDISETLSDAFSFDDETPQDRGALGNDLTRNQSVEDLSFLEQGYQHRYMI
:: . : ..: :
BAH13127.1 AGVINRERIPTFERMLWRVCRGNVFLRQAEIENPLEDPVTGDYVHKSVFIIFFQGDQLKN
EAW98433.1 SGLINQGKVEAFEKMLWRVCKGYTIVSYAELDESLEDPETGEVIKWYVFLISFWGEQIGH
VPH1 TGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFSHGDLIIK
STV1 TGSIRRTKVDILNRILWRLLRGNLIFQNFPIEEPL--LEGKEKVEKDCFIIFTHGETLLK
:* * . .: ::.:***: .* :. :::.: : . *:: .*: : :
BAH13127.1 RVKKICEGFRASLYPCPETPQERKEMASGVNTRIDDLQMVLNQTEDHRQRVLQAAAKNIR
EAW98433.1 KVKKICDCYHCHVYPYPNTAEERREIQEGLNTRIQDLYTVLHKTEDYLRQVLCKAAESVY
VPH1 RIRKIAESLDANLYDVDSSNEGRSQQLAKVNKNLSDLYTVLKTTSTTLESELYAIAKELD
STV1 KVKRVIDSLNGKIVSLNTRS---SELVDTLNRQIDDLQRILDTTEQTLHTELLVIHDQLP
.:..: : : : :* .:.** :* *. * ..:
BAH13127.1 VWFIKVRKMKAIYHTLNLCNIDVTQKCLIAEVWCPVTDLDSIQFALRRGTEHSGSTVPSI
EAW98433.1 SRVIQVKKMKAIYHMLNMCSFDVTNKCLIAEVWCPEADLQDLRRALEEGSRESGATIPSF
VPH1 SWFQDVTREKAIFEILNKSNYDTNRKILIAEGWIPRDELATLQARLGEMIARLGIDVPSI
STV1 VWSAMTKREKYVYTTLNK--FQQESQGLIAEGWVPSTELIHLQDSLKDYIETLGSEYSTV
. . . * :: ** : : **** * * :* :. * * .:.
BAH13127.1 LNRMQTNQTPPTYNKTNKFTYGFQNIVDAYGIGTYREINPAPYTIITFPFLFAVMFGDFG
EAW98433.1 MNIIPTKETPPTRIRTNKFTEGFQNIVDAYGVGSYREVNPALFTIITFPFLFAVMFGDFG
VPH1 IQVLDTNHTPPTFHRTNKFTAGFQSICDCYGIAQYREINAGLPTIVTFPFMFAIMFGDMG
STV1 FNVILTNKLPPTYHRTNKFTQAFQSIVDAYGIATYKEINAGLATVVTFPFMFAIMFGDMG
:: : *: *** .***** .**.* *.**:. *.*:*.. *::****:**:****:*
BAH13127.1 HGILMTLFAVWMVLRESRILSQKNENEMFSTVFSGRYIILLMGVFSMYTGLIYNDCFSKS
EAW98433.1 HGFVMFLFALLLVLNENHPRLNQSQ-EIMRMFFNGRYILLLMGLFSVYTGLIYNDCFSKS
VPH1 HGFLMTLAALSLVLNEKKINKMKRG-EIFDMAFTGRYIILLMGVFSMYTGFLYNDIFSKT
STV1 HGFILFLMALFLVLNERKFGAMHRD-EIFDMAFTGRYVLLLMGAFSVYTGLLYNDIFSKS
**::: * *: :**.* . : *:: *.***::**** **:***::*** ***:
BAH13127.1 LNIFGSSWSVRPMFTYN-----------WTEETLRGNPVLQLNPALPGVFGGPYPFGIDP
EAW98433.1 VNLFGSGWNVSAMYSSSHPPAEHKKMVLWNDSVVRHNSILQLDPSIPGVFRGPYPLGIDP
VPH1 MTIFKSGWK-------------------WPDHWKKGESITATSV-------GTYPIGLDW
STV1 MTIFKSGWQ-------------------WPSTFRKGESIEAKKT-------GVYPFGLDF
:.:* *.*. * . . :.: . * **:*:*
BAH13127.1 IWNIATNKLTFLNSFKMKMSVILGIIHMLFGVSLSLFNHIYFKKPLNIYFGFIPEIIFMT
EAW98433.1 IWNLATNRLTFLNSFKMKMSVILGIIHMTFGVILGIFNHLHFRKKFNIYLVSIPELLFML
VPH1 AWHGTENALLFSNSYKMKLSILMGFIHMTYSYFFSLANHLYFNSMIDIIGNFIPGLLFMQ
STV1 AWHGTDNGLLFSNSYKMKLSILMGYAHMTYSFMFSYINYRAKNSKVDIIGNFIPGLVFMQ
*: : * * * **:***:*:::* ** :. :. *: .. .:* ** ::**
BAH13127.1 SLFGYLVILIFYKWTAYDAHTSENAPSLLIHFINMFLFSYPESGYSMLYSGQKGIQCFLV
EAW98433.1 CIFGYLIFMIFYKWLVFSAETSRVAPSILIEFINMFLF--PASKTSGLYTGQEYVQRVLL
VPH1 GIFGYLSVCIVYKWAVDWVKDGKPAPGLLNMLINMFLS--PGTIDDELYPHQAKVQVFLL
STV1 SIFGYLSWAIVYKWSKDWIKDDKPAPGLLNMLINMFLA--PGTIDDQLYSGQAKLQVVLL
:**** *.*** . **.:* :***** * : . **. * :* .*:
BAH13127.1 VVALLCVPWMLLFKPLVLRRQY--------LRRKHLGTLNFGGIRVGNGPTEEDAEIIQH
EAW98433.1 VVTALSVPVLFLGKPLFLLWLH--------NGRSCFG-VNRSGYTLIRKDSEEEVSLLGS
VPH1 LMALVCIPWLLLVKPLHFKFTH---------KKKSHEPLPSTEADA----SSEDLEAQQL
STV1 LAALVCVPWLLLYKPLTLRRLNKNGGGGRPHGYQSVGNIEHEEQIAQQRHSAEGFQGMII
: : :.:* ::* *** : . : : * .
BAH13127.1 DQLSTHSEDADE---------FDFGDTMVHQAIHTIEYCLGCISNTASYLRLWALSLAHA
EAW98433.1 QDIEEGNHQVEDGCREMACEEFNFGEILMTQVIHSIEYCLGCISNTASYLRLWALSLAHA
VPH1 ISAMDADDAEEEEVGSGSHGE-DFGDIMIHQVIHTIEFCLNCVSHTASYLRLWALSLAHA
STV1 SDVASVADSINESVGGGEQGPFNFGDVMIHQVIHTIEFCLNCISHTASYLRLWALSLAHA
. :: :**: :: *.**:**:**.*:*:***************
BAH13127.1 QLSEVLWTMVIHIGLSVKSL---AGGLVLFFFFTAFATLTVAILLIMEGLSAFLHALRLH
EAW98433.1 QLSDVLWAMLMRVGLRVDTT---YGVLLLLPVIALFAVLTIFILLIMEGLSAFLHAIRLH
VPH1 QLSSVLWTMTIQIAFGFRGF---VGVFMTVALFAMWFALTCAVLVLMEGTSAMLHSLRLH
STV1 QLSSVLWDMTISNAFSSKNSGSPLAVMKVVFLFAMWFVLTVCILVFMEGTSAMLHALRLH
***.*** * : .: . : . .:: : .** :*::*** **:**::***
BAH13127.1 WVEFQNKFYSGTGFKFLPFSFE------HIREGKFEE-----
EAW98433.1 WVEFQNKFYVGAGTKFVPFSF-------SLLSSKFNNDDSVA
VPH1 WVESMSKFFVGEGLPYEPFAFEYKDMEVAVASASSSASS---
STV1 WVEAMSKFFEGEGYAYEPFSFR------AIIE----------
*** .**: * * : **:* : .
>VPH1 YOR270C SGDID:S000005796
MAEKEEAIFRSAEMALVQFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIR
RLDNVERQYRYFYSLLKKHDIKLYEGDTDKYLDGSGELYVPPSGSVIDDYVRNASYLEER
LIQMEDATDQIEVQKNDLEQYRFILQSGDEFFLKGDNTDSTSYMDEDMIDANGENIAAAI
GASVNYVTGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFS
HGDLIIKRIRKIAESLDANLYDVDSSNEGRSQQLAKVNKNLSDLYTVLKTTSTTLESELY
AIAKELDSWFQDVTREKAIFEILNKSNYDTNRKILIAEGWIPRDELATLQARLGEMIARL
GIDVPSIIQVLDTNHTPPTFHRTNKFTAGFQSICDCYGIAQYREINAGLPTIVTFPFMFA
IMFGDMGHGFLMTLAALSLVLNEKKINKMKRGEIFDMAFTGRYIILLMGVFSMYTGFLYN
DIFSKTMTIFKSGWKWPDHWKKGESITATSVGTYPIGLDWAWHGTENALLFSNSYKMKLS
ILMGFIHMTYSYFFSLANHLYFNSMIDIIGNFIPGLLFMQGIFGYLSVCIVYKWAVDWVK
DGKPAPGLLNMLINMFLSPGTIDDELYPHQAKVQVFLLLMALVCIPWLLLVKPLHFKFTH
KKKSHEPLPSTEADASSEDLEAQQLISAMDADDAEEEEVGSGSHGEDFGDIMIHQVIHTI
EFCLNCVSHTASYLRLWALSLAHAQLSSVLWTMTIQIAFGFRGFVGVFMTVALFAMWFAL
TCAVLVLMEGTSAMLHSLRLHWVESMSKFFVGEGLPYEPFAFEYKDMEVAVASASSSASS
*
CLUSTAL plus hand-editing
Snm1 MNKDQAEKYQERSLRQKYNLLHVLP-------TLNSRALSGLYYKNFHNS-VKRYQIMLP
1x0t.1.A -----------ERIDTLFTLAERV--------ARYSPDLAKRYVELALEI-QKKAKVKIP
Rpr2 -------------LNYLYQISAYQTRARQKARTDAHTPLA-RNYIKSMDLISKKTKTSLL
Rpa12 ------------------------------------------------------------
Snm1 EQLKSGKFCSHCGCVYVPNFNASLQLTTNTEQGDSDELGGESMEGPKKCIQVNCLNCEKS
1x0t.1.A RKWK-RRYCKRCHTFLIPGVNARVRLRTKR----------------MPHVVITCLECGYI
Rpr2 PTIK-RTICKKCHRLLWTPKKLEITSD--------------------GALSVMC-GCGTV
Rpa12 -----KEKCPQCGNEEM------NYHTLQ--LRSADE---------GATVFYTCTSCGYK
Snm1 KLFEWKSEFVVPTFGQDVSPMINSTSSGKVSYAVKKPQKSKTSTGKERSKKRKLNSLTNL
1x0t.1.A MRYPYLREVK--------------------------------------------------
Rpr2 KRFNIGADPNYRTYSEREGNLLNS------------------------------------
Rpa12 FRTNN-------------------------------------------------------
Snm1 LSKRNQEKKMEKKKSSSLSLESFMKS
1x0t.1.A --------------------------
Rpr2 --------------------------
Rpa12 --------------------------
>VPH1 YOR270C SGDID:S000005796
ATGGCAGAGAAGGAGGAAGCGATTTTTCGCTCTGCTGAAATGGCTTTAGTCCAATTCTAT
ATTCCTCAAGAAATTTCAAGAGACTCTGCTTACACTTTAGGTCAATTGGGTCTTGTTCAA
TTCCGTGACTTGAACTCTAAGGTGCGTGCGTTTCAAAGAACTTTCGTGAACGAAATTAGA
AGACTGGATAATGTAGAAAGACAGTATCGTTACTTTTATTCTCTTTTGAAGAAACACGAT
ATTAAGCTCTACGAAGGAGACACGGACAAATATTTGGACGGCTCAGGTGAATTGTACGTT
CCACCAAGCGGTTCAGTGATAGATGATTATGTCCGGAACGCTTCATATTTGGAAGAAAGA
TTGATTCAAATGGAGGATGCAACCGATCAAATTGAAGTCCAGAAAAATGACTTGGAACAG
TATCGCTTTATTTTGCAGTCAGGTGATGAATTTTTCTTGAAGGGTGATAATACCGACAGC
ACTTCCTATATGGATGAAGACATGATCGACGCTAATGGGGAAAACATTGCTGCTGCTATC
GGTGCTTCTGTAAACTATGTCACTGGTGTCATTGCTAGAGACAAAGTTGCCACCTTAGAA
CAAATTCTTTGGAGAGTATTAAGAGGTAACCTTTTCTTCAAAACTGTTGAAATTGAACAA
CCTGTTTATGATGTCAAAACCAGGGAGTATAAACATAAAAATGCTTTTATCGTATTTTCT
CACGGTGATCTGATTATTAAAAGAATCAGAAAGATTGCGGAATCATTGGATGCCAATCTT
TACGATGTTGACTCTTCCAACGAGGGTAGATCACAACAATTGGCCAAGGTCAACAAGAAT
TTGAGTGATTTGTACACAGTTTTGAAAACCACTTCTACCACTTTAGAAAGTGAATTATAT
GCCATTGCCAAAGAATTGGACTCTTGGTTCCAAGATGTTACCCGTGAAAAGGCGATTTTT
GAAATTTTGAACAAGTCTAACTATGATACCAATAGAAAGATTTTGATTGCTGAAGGTTGG
ATACCAAGAGACGAATTGGCTACTTTGCAAGCTCGTCTTGGTGAAATGATCGCAAGATTG
GGTATTGATGTCCCATCCATTATCCAAGTCCTGGATACAAACCACACTCCACCTACCTTC
CACAGAACTAACAAGTTTACTGCTGGTTTCCAAAGTATCTGTGACTGTTACGGTATTGCT
CAGTACAGAGAAATCAATGCTGGTTTACCCACAATTGTCACTTTCCCTTTCATGTTTGCC
ATCATGTTTGGTGATATGGGTCACGGGTTCTTAATGACCTTAGCCGCATTGTCTCTTGTA
TTGAATGAAAAGAAAATCAACAAAATGAAAAGAGGCGAAATTTTCGATATGGCCTTCACT
GGTAGATACATTATTTTGTTGATGGGTGTCTTTTCCATGTACACAGGTTTCCTTTACAAC
GATATCTTCTCTAAAACTATGACTATTTTCAAGTCTGGTTGGAAATGGCCTGATCATTGG
AAAAAAGGTGAGAGTATTACTGCTACATCGGTGGGTACATACCCTATCGGTTTAGATTGG
GCTTGGCATGGAACTGAAAATGCTTTGTTATTTTCTAATTCTTACAAAATGAAACTATCA
ATTTTAATGGGGTTCATCCACATGACCTATTCTTATTTCTTTTCGTTGGCTAACCACCTA
TACTTTAACTCTATGATTGATATCATCGGTAACTTTATTCCTGGTTTGCTATTTATGCAA
GGTATCTTTGGTTATCTTTCCGTTTGTATTGTTTACAAATGGGCTGTTGATTGGGTTAAG
GACGGAAAGCCTGCTCCAGGTTTGTTAAATATGTTGATCAACATGTTTTTATCACCAGGA
ACTATTGACGATGAATTATACCCTCATCAAGCAAAGGTCCAAGTGTTTTTGTTGTTGATG
GCCTTGGTTTGTATTCCTTGGTTGCTATTGGTGAAGCCATTACATTTCAAATTCACTCAT
AAAAAGAAATCTCACGAACCACTGCCATCGACTGAAGCAGATGCTAGTTCTGAAGATTTG
GAAGCACAACAATTAATTTCCGCGATGGACGCCGATGACGCTGAAGAAGAAGAAGTTGGT
TCTGGATCTCATGGTGAAGACTTTGGTGATATTATGATTCATCAAGTTATTCATACAATT
GAATTCTGTTTGAATTGTGTTTCGCACACTGCATCCTATTTACGTTTATGGGCCTTATCA
TTGGCACATGCTCAATTGTCTAGTGTTTTATGGACAATGACAATTCAAATTGCCTTTGGA
TTTAGAGGATTTGTGGGTGTGTTTATGACGGTTGCACTTTTTGCCATGTGGTTCGCACTA
ACATGTGCAGTTCTTGTTTTGATGGAAGGTACATCTGCCATGCTTCATTCCTTACGTTTG
CACTGGGTTGAATCTATGTCCAAGTTTTTCGTGGGTGAAGGTTTACCATACGAACCATTC
GCATTTGAGTATAAAGACATGGAAGTCGCTGTTGCTAGTGCAAGCTCTTCCGCTTCAAGC
TAA
>ScSTV1 YMR054W SGDID:S000004658
MNQEEAIFRSADMTYVQLYIPLEVIREVTFLLGKMSVFMVMDLNKDLTAFQRGYVNQLRR
FDEVERMVGFLNEVVEKHAAETWKYILHIDDEGNDIAQPDMADLINTMEPLSLENVNDMV
KEITDCESRARQLDESLDSLRSKLNDLLEQRQVIFECSKFIEVNPGIAGRATNPEIEQEE
RDVDEFRMTPDDISETLSDAFSFDDETPQDRGALGNDLTRNQSVEDLSFLEQGYQHRYMI
TGSIRRTKVDILNRILWRLLRGNLIFQNFPIEEPLLEGKEKVEKDCFIIFTHGETLLKKV
KRVIDSLNGKIVSLNTRSSELVDTLNRQIDDLQRILDTTEQTLHTELLVIHDQLPVWSAM
TKREKYVYTTLNKFQQESQGLIAEGWVPSTELIHLQDSLKDYIETLGSEYSTVFNVILTN
KLPPTYHRTNKFTQAFQSIVDAYGIATYKEINAGLATVVTFPFMFAIMFGDMGHGFILFL
MALFLVLNERKFGAMHRDEIFDMAFTGRYVLLLMGAFSVYTGLLYNDIFSKSMTIFKSGW
QWPSTFRKGESIEAKKTGVYPFGLDFAWHGTDNGLLFSNSYKMKLSILMGYAHMTYSFMF
SYINYRAKNSKVDIIGNFIPGLVFMQSIFGYLSWAIVYKWSKDWIKDDKPAPGLLNMLIN
MFLAPGTIDDQLYSGQAKLQVVLLLAALVCVPWLLLYKPLTLRRLNKNGGGGRPHGYQSV
GNIEHEEQIAQQRHSAEGFQGMIISDVASVADSINESVGGGEQGPFNFGDVMIHQVIHTI
EFCLNCISHTASYLRLWALSLAHAQLSSVLWDMTISNAFSSKNSGSPLAVMKVVFLFAMW
FVLTVCILVFMEGTSAMLHALRLHWVEAMSKFFEGEGYAYEPFSFRAIIE*
>TdSTV1 XP_003680050.1 hypothetical protein TDEL_0B07100 [Torulaspora delbrueckii]
MTYVQLYIPLEISREVVCLLGNLGNLMFRDLNRDLTAFQRAYVDQVRKFDDVERLVLHMREVADKHAEATWKYILHTDDE
GNDLQRPSLAQLVSTMHTHSHDSIHEMVEDITSFEGRVRQMDQSLINLRERLNGLLEQRCVIFECSRFLEGNPGIFGRVA
REQRELMDVDEFSLAGDEVSENLSDTFSFDDGIEGAGLYEQAQNNSRRDSGSSGNFDLLERGFHNRFMIAGSIKRDKIDV
LNRIIWRLLRGNLFFQNFAINEPLLEDGERVEKDCFVVFTHGDTLLQKVRRVVDSLGGKVFSLDQQSHESLQRLNDKISD
LQQIVLTTEQTLHTELLVVTDQLPMWNAMVKREKYIFATLNLFKQESHGLVAEGWIPSSDLTTVSNSLKDYSDSVGSEYS
TVVSVIHTNKLPPTYHRTNKFTQAFQSIVDAYGIATYKEINPGLATVVTFPFMFAIMFGDLGHGFILFLVGLVLWLNENK
FETMTRGEIFDMAYTGRYVIVLMGAFSMYTGLMYNDIFSKSMTLFKSGWQWPSTFKIGETLEATKVGVYPFGLDFAWHST
DNGLLFSNSYKMKLSILMGFIHMTYSFMFSYINYKNRHSTVDIIGNFVPGLIFMQSIFGYLSWAIIYKWSKDWIKDERPA
PALLNMLINMFLAPGTVDEQLYRGQAFLQTVLLIAALVCVPWLLLYKPLTLRRQNKHAIDNGYQSVSDQQHTESLIDSQQ
DAGDDMVVTDFGNEEEHKQFNFGDVMIHQVIHTIEFCLNCISHTASYLRLWALSLAHAQLSSVLWSMTIQNAFSSDDSGS
PLAVTKVVVLFGMWFVLTVCILVAMEGTSAMLHSLRLHWVEAMSKFFEGEGYAYEPFSFENIIE*
>Yl AOW05203.1 hypothetical protein YALI1_E12482g [Yarrowia lipolytica]
METAAMEREAIFRSAEMSLVQLYVASEIGRDVVAALGELGVVMFRDLNTSVNVFQRSFIKEIRRVDGVERQLRGLRAHID
KHGVAIDEQPEGVAAPTLDEVDNMCHQVGALEERVGHLDTTWNELVDKRALILERREMVQTAGIFFADARENRHEIRASL
EGDRAGLLYDLDDPQPDVEAATVTWNSVAGLSFVTGVIPSTKTAIFERILWRSLRGNLYFRHQAIEKPLAGVRKDVFIVF
GHGESLLAKIKRIALTLDATLYPVSEDFDTRREQVEELNIKLADVDNVLGSTNNALMTELALAANTLPHWEVLANKEKAI
YHTLNMFNYDQTRRCLIAEGWIPKADFRAVQEVLRDVTLSSGVAINSILNEIKTSKTPPTFHRTNKFTAAFQLIVDAYGI
ASYQEINPGLATVVTFPFMFAIMFGDLGHGVILALAGLVMVLKEKSILKMRNRDEIFDMAFSGRYIVLLMGIFSLYTGLM
YNDIFSKSMTLFRSGWAWPESWEEKERITAHQTGVYPFGLDPAWHGTDNNLLFTNSYKMKLSILMGFTHMSYSFFFSFLN
YKFFNSQIDIWGNFVPGLLFMQSIFGYLSLTIVYKWCVDWIAKDKTPPGLLNMLINMFLSPGTIDAPLYPGQKFVQIILV
LIALVCVPWLLLLKPLYLRRQHKQTQYDAIRQPNAYHIGDTDDDADSFDMTIEEFEEEGEGHEQFEFGEVMIHQVIHTIE
FCLNCVSHTASYLRLWALSLAHAQLSTVLWDMTIQGAFGPTGPAGVAMVVIMFAMWFVLTVVILVMMEGTSAMLHSLRLH
WVEAMSKFFEGEGYAYAPFNFKDQQ*
>Cl OVF08817.1 putative V-type proton ATPase subunit [Clavispora lusitaniae]
MPEKQEAVFRSAEMSLVQLYVPTEVARDVIHKVGSLNLVQFRDLNKGVNEFQRAFVQELRKLDNVERQYTYLKAELDKRG
IPSKIYPYDQASNCPQSDIDMYAESANFLESRVVELTDSCETLYKKRKELKQFKYTVDAVENFFSANSAPGHDTIGSDAL
LSELETGGTEFHAEFLSGVIDRRKVFTLQQILWRTLRGNLFYYTEELPEKIYDAKSNSYVEKNAFIIFSHGSLIYQRIKK
IAESLDADLYKVDSTSDLRSEQVKGLQSDLNDLKTVLDETENALNSELVVISRDLSKWWREIAREKAVYKTMNLCDYDNS
RKTLIAEGWIPTDEIDDLSSQVKSLSASDTVPTIVNILETTKTPPTFHRTNKFTAAFQSICDTYGVASYREINPGLPTII
TFPFMFAIMFGDLGHGFIMFLAALVLVLKEKKIQAMKRDEIFDMAFSGRYILLLMGLFSMYTGFLYNDVFSKSMDFFKSG
WEWPETFQPGETIHATKVGVYPIGLDPAWHGAENGLLFSNSYKMKLSVLMGYLHMTYSYFFSLANAIFFNSPIDIIGNFI
PGLLFMQGIFGYLSLCIVYKWTVNWFAVGKQPPGLLNMLISMFLAPGEVAEPLYNGQATVQLYLVVVALICVPWLILVKP
LYLKRQIDRAAKEHSYERLTESESPQTITEDEEEHGNEEDDEEAHDDHNFGDIMIHQIIHTIEFCLNCVSHTASYLRLWA
LSLAHAQLSTVLWTMTISNAFGVTGIIGVIMTVFLFAMWLVLTVVILVIMEGTSAMLHSLRLHWVESMSKFFEGEGTAYE
PFGFNDLLTDVF*
>VPH1 YOR270C SGDID:S000005796
MAEKEEAIFRSAEMALVQFYIPQEISRDSAYTLGQLGLVQFRDLNSKVRAFQRTFVNEIR
RLDNVERQYRYFYSLLKKHDIKLYEGDTDKYLDGSGELYVPPSGSVIDDYVRNASYLEER
LIQMEDATDQIEVQKNDLEQYRFILQSGDEFFLKGDNTDSTSYMDEDMIDANGENIAAAI
GASVNYVTGVIARDKVATLEQILWRVLRGNLFFKTVEIEQPVYDVKTREYKHKNAFIVFS
HGDLIIKRIRKIAESLDANLYDVDSSNEGRSQQLAKVNKNLSDLYTVLKTTSTTLESELY
AIAKELDSWFQDVTREKAIFEILNKSNYDTNRKILIAEGWIPRDELATLQARLGEMIARL
GIDVPSIIQVLDTNHTPPTFHRTNKFTAGFQSICDCYGIAQYREINAGLPTIVTFPFMFA
IMFGDMGHGFLMTLAALSLVLNEKKINKMKRGEIFDMAFTGRYIILLMGVFSMYTGFLYN
DIFSKTMTIFKSGWKWPDHWKKGESITATSVGTYPIGLDWAWHGTENALLFSNSYKMKLS
ILMGFIHMTYSYFFSLANHLYFNSMIDIIGNFIPGLLFMQGIFGYLSVCIVYKWAVDWVK
DGKPAPGLLNMLINMFLSPGTIDDELYPHQAKVQVFLLLMALVCIPWLLLVKPLHFKFTH
KKKSHEPLPSTEADASSEDLEAQQLISAMDADDAEEEEVGSGSHGEDFGDIMIHQVIHTI
EFCLNCVSHTASYLRLWALSLAHAQLSSVLWTMTIQIAFGFRGFVGVFMTVALFAMWFAL
TCAVLVLMEGTSAMLHSLRLHWVESMSKFFVGEGLPYEPFAFEYKDMEVAVASASSSASS
*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment