Skip to content

Instantly share code, notes, and snippets.

@MichelleDalalJian
Created November 24, 2017 16:40
Show Gist options
  • Save MichelleDalalJian/f587530b6e0a72357541f39b2022aa55 to your computer and use it in GitHub Desktop.
Save MichelleDalalJian/f587530b6e0a72357541f39b2022aa55 to your computer and use it in GitHub Desktop.
Extracting Data from XML: The program will prompt for a URL, read the XML data from that URL using urllib and then parse and extract the comment counts from the XML data, compute the sum of the numbers in the file.
from urllib import request
import xml.etree.ElementTree as ET
url = 'http://python-data.dr-chuck.net/comments_24966.xml'
print ("Retrieving", url)
html = request.urlopen(url)
data = html.read()
print("Retrieved",len(data),"characters")
tree = ET.fromstring(data)
results = tree.findall('comments/comment')
icount=len(results)
isum=0
for result in results:
isum += float(result.find('count').text)
print(icount)
print(isum)
@Perziver
Copy link

i am getting error :-

Traceback (most recent call last):
File "/Users/shantanusoni/Documents/ImpDoc/shekhar/studie_related/Coding_Culture/Python/python-coursera/example.py", line 2, in
import xml.etree.ElementTree as ET
File "/Users/shantanusoni/Documents/ImpDoc/shekhar/studie_related/Coding_Culture/Python/python-coursera/xml.py", line 2, in
import xml.etree.ElementTree as ET
ModuleNotFoundError: No module named 'xml.etree'; 'xml' is not a package

Plz help me

@manam1000
Copy link

My Answer
import urllib.request
import xml.etree.ElementTree as ET

url = input('Enter location: ')
if len(url) < 1 :
url = 'http://py4e-data.dr-chuck.net/comments_2198633.xml'

op = urllib.request.urlopen(url).read()
print('Retrieved',len(op),'characters')

tree = ET.fromstring(op)
count = tree.findall('.//count')
print(f'Count: {len(count)}')
sum = 0
for items in count:
x= int(items.text)
sum = sum + x
print(f'Sum: {sum}')

@manam1000
Copy link

why I keep having this error ? ('NoneType' object has no attribute 'text') import urllib.request, urllib.parse, urllib.error import ssl import xml.etree.ElementTree as ET ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE Value = input('Enter location: ') print('Retrieving',Value) uh = urllib.request.urlopen(Value, context=ctx) data = uh.read() data = data.decode() tree = ET.fromstring(data) counts = tree.findall('.//count') print('Retrieved',len(data),'characters') counter = 0 sum = 0 for elements in counts: counter += 1 sum = (elements.find('count').text) + sum print(counter) print(sum)

On the line (sum = (elements.find('count').text) + sum), it should be (sum = int((elements.text)) + sum) instead, and it should be in the for loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment