Created
December 3, 2012 07:25
-
-
Save moorepants/4193369 to your computer and use it in GitHub Desktop.
Combine two gene files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# first open the two files and read their lines into lists | |
f1 = open('file1.txt', 'r') | |
f2 = open('file2.txt', 'r') | |
lines1 = f1.readlines() | |
lines2 = f2.readlines() | |
f1.close() | |
f2.close() | |
# now create a dictionary (keyword mapping) to map the gene name to the | |
# count column for the second file. This allows you to type mapping['aaaD'] and | |
# you will get the count value back | |
mapping = {} | |
for line in lines2: | |
a = line.strip().split('\t') | |
mapping[a[1]] = a[3] | |
# Now walk through the lines from the first file and append the count column | |
# from the second file which you have stored in the mapping dictionary. | |
newlines = '' | |
for line in lines1: | |
gene = line.strip().split('\t')[1] | |
# the try/except statement handles if the gene isn't in the second file but | |
# is in the first. It just puts a zero for the count if it isn't. | |
try: | |
newlines += line.strip() + '\t' + mapping[gene] + '\n' | |
except KeyError: | |
newlines += line.strip() + '\t' + '0' + '\n' | |
# Finally save a new file with the new columns written | |
f3 = open('file3.txt', 'w') | |
f3.writelines(newlines) | |
f3.close() | |
# the short version :) | |
with open('file2.txt') as f: | |
mapping = {x.strip().split('\t')[1]: x.strip().split('\t')[3] for x in | |
f.readlines()} | |
with open('file1.txt') as f: | |
newlines = '' | |
for line in f.readlines(): | |
newlines += line.strip() + '\t' + mapping[line.strip().split('\t')[1]] + '\n' | |
with open('file3.txt', 'w') as f: | |
f.writelines(newlines) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 gene descriptor count | |
2 aaaD testing1 2 | |
3 aaaA testing2 2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 gene descriptor count | |
2 aaaD testing1 3 | |
3 aaaA testing2 4 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment