Skip to content

Instantly share code, notes, and snippets.

@philippmuench
Created October 3, 2017 20:53
Show Gist options
  • Save philippmuench/f5fd5accede3a6a6e7b4ee6fbb6ca024 to your computer and use it in GitHub Desktop.
Save philippmuench/f5fd5accede3a6a6e7b4ee6fbb6ca024 to your computer and use it in GitHub Desktop.
code to parse crispr xml file
import xml.etree.ElementTree as ET
tree = ET.parse('all_crispr.xml')
root = tree.getroot()
offset = 1000
for id in root.findall("./Taxons/Taxon/Sequences/Sequence"):
refseq = id.find('RefSeq').text
for crispr in id.findall("CRISPRs"):
crispr_num = crispr.find('CRISPRCount').text
if int(crispr_num) > 0:
for name in crispr.findall("CRISPR"):
start = int(name.find('BeginningPosition').text) - offset
if start < 0:
start = 0
end = int(name.find('EndingPosition').text) + offset
hyp = name.find('Hypothetical').text
if hyp =="No":
print(refseq + ".1 " + str(start) + " " + str(end))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment