Skip to content

Instantly share code, notes, and snippets.

@armanbilge
Created July 30, 2013 03:56
Show Gist options
  • Select an option

  • Save armanbilge/6110107 to your computer and use it in GitHub Desktop.

Select an option

Save armanbilge/6110107 to your computer and use it in GitHub Desktop.
A script to deinterleave phylip-formatted sequence files.
#!/usr/bin/env python
import sys
try:
stream = open(sys.argv[1], 'r')
except:
stream = sys.stdin
is_header = True
is_first_row = True
lines = []
for l in stream:
if is_header:
print l[:-1]
is_header = False
else:
if l.lstrip() == '':
is_first_row = False
i = 0
else:
if is_first_row:
lines.append(l[:-1])
else:
lines[i] += ' ' + l[:-1].lstrip()
i += 1
if stream is not sys.stdin:
stream.close()
for l in lines: print l
@dysh
Copy link
Copy Markdown

dysh commented May 3, 2014

Hi! Some programs like TCS do not tolerate spaces in sequences. Therefore the last line may be substituted with something like:

for l in lines: print l[0:10]+l[11:-1].replace(" ","")

cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment