Skip to content

Instantly share code, notes, and snippets.

@moorepants
Created December 3, 2012 07:25
Show Gist options
  • Save moorepants/4193369 to your computer and use it in GitHub Desktop.
Save moorepants/4193369 to your computer and use it in GitHub Desktop.
Combine two gene files
#!/usr/bin/env python
# first open the two files and read their lines into lists
f1 = open('file1.txt', 'r')
f2 = open('file2.txt', 'r')
lines1 = f1.readlines()
lines2 = f2.readlines()
f1.close()
f2.close()
# now create a dictionary (keyword mapping) to map the gene name to the
# count column for the second file. This allows you to type mapping['aaaD'] and
# you will get the count value back
mapping = {}
for line in lines2:
a = line.strip().split('\t')
mapping[a[1]] = a[3]
# Now walk through the lines from the first file and append the count column
# from the second file which you have stored in the mapping dictionary.
newlines = ''
for line in lines1:
gene = line.strip().split('\t')[1]
# the try/except statement handles if the gene isn't in the second file but
# is in the first. It just puts a zero for the count if it isn't.
try:
newlines += line.strip() + '\t' + mapping[gene] + '\n'
except KeyError:
newlines += line.strip() + '\t' + '0' + '\n'
# Finally save a new file with the new columns written
f3 = open('file3.txt', 'w')
f3.writelines(newlines)
f3.close()
# the short version :)
with open('file2.txt') as f:
mapping = {x.strip().split('\t')[1]: x.strip().split('\t')[3] for x in
f.readlines()}
with open('file1.txt') as f:
newlines = ''
for line in f.readlines():
newlines += line.strip() + '\t' + mapping[line.strip().split('\t')[1]] + '\n'
with open('file3.txt', 'w') as f:
f.writelines(newlines)
1 gene descriptor count
2 aaaD testing1 2
3 aaaA testing2 2
1 gene descriptor count
2 aaaD testing1 3
3 aaaA testing2 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment