-
-
Save jameslyons/00cad4f1ab6a49c2b3df to your computer and use it in GitHub Desktop.
import sys | |
if len(sys.argv) <= 1: | |
print 'usage: python pdb2fasta.py file.pdb > file.fasta' | |
exit() | |
input_file = open(sys.argv[1]) | |
letters = {'ALA':'A','ARG':'R','ASN':'N','ASP':'D','CYS':'C','GLU':'E','GLN':'Q','GLY':'G','HIS':'H', | |
'ILE':'I','LEU':'L','LYS':'K','MET':'M','PHE':'F','PRO':'P','SER':'S','THR':'T','TRP':'W', | |
'TYR':'Y','VAL':'V'} | |
print '>',sys.argv[1] | |
prev = '-1' | |
for line in input_file: | |
toks = line.split() | |
if len(toks)<1: continue | |
if toks[0] != 'ATOM': continue | |
if toks[4] != prev: | |
sys.stdout.write('%c' % letters[toks[3]]) | |
prev = toks[4] | |
sys.stdout.write('\n') | |
input_file.close() |
Hello, it's the same for me: I change toks[4] in line 19 and 21 to toks[3].
Best,
Guillaume
Lines 19 and 21 should be changed to toks[5] rather than toks[3] as the latter solution skips amino acids that are the same as the previous one in the sequence.
Lines 18 and 20 "prev=toks[4]" should be changed to toks[5] rather than toks[4] .
hello, i develop a program from your program~~~
https://github.com/dongshuyan/pdb2fasta
Hi,
This script is not useful for sequence having repeated residues.
After changing [5] to [3] script runs, but gives wrong output. For examples if original pdb has sequence "IIPLEES" this script generates "IPLES", This script omits repeating residues.
.
Download rosetta and use the script "get_fasta_from_pdb.py" from folder "rosetta_bin_linux_2019.35.60890_bundle/tools/protein_tools/scripts".
Syntax: python get_fasta_from_pdb.py PDB chainID outputname
Hello, I tried your pdb2fasta.py, the result output file only has one letter. After I change toks[4] in line 19 and 21 to toks[3], the result seems correct.
It's really great help to me.
Best,
Jiim