-
-
Save MichelleDalalJian/2c9aaadbda21290e1ccfc87a9ab1f937 to your computer and use it in GitHub Desktop.
#Actual data: http://py4e-data.dr-chuck.net/comments_24964.html (Sum ends with 73) | |
from urllib import request | |
from bs4 import BeautifulSoup | |
html=request.urlopen('http://python-data.dr-chuck.net/comments_24964.html').read() | |
soup = BeautifulSoup(html) | |
tags=soup('span') | |
sum=0 | |
for tag in tags: | |
sum=sum+int(tag.contents[0]) | |
print(sum) |
Notes Regarding the Use of BeautifulSoup
The sample code for this course and textbook examples use BeautifulSoup to parse HTML.
Using BeautifulSoup 4 with Python 3.10 or Python 3.11
Instructions for Windows 10:
-
pip install beautifulsoup4 (run this command)
-
if the bs4.zip file was downloaded, delete it
Instructions for MacOS:
-
pip3 install beautifulsoup4 (run this command)
-
if the bs4.zip file was downloaded or you have a bs4 folder, delete it
Using BeautifulSoup 3 (only for Python 3.8 or Python 3.9)
If you want use our samples "as is", download our Python 3 version of BeautifulSoup 3 from
http://www.py4e.com/code3/bs4.zip
You must unzip this into a "bs4" folder and have that folder as a sub-folder of the folder where you put our sample code like:
Hello I tried this for the same question:
#Scraping Numbers from HTML using BeautifulSoup
from urllib.request import urlopen
from bs4 import BeautifulSoup
import ssl
import re
Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('Enter - ')
html = urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, "html.parser")
Retrieve all of the anchor tags
counts = dict()
my_list = list()
tags = soup('span')
for tag in tags:
# Look at the parts of a tag
num = str(tag)
number = re.findall('[0-9]+', num)
if len(number) != 1:
continue
for integer in number:
integer = int(y)
my_list = append(integer)
counts[integer] = counts.get(integer, 0 ) + 1
print('Count ', counts)
#or you can say
#print('Count ', len(my_list))
print('Sum ', sum(my_list))
For window user follow the instruction given by instructor in the discussion forum than the above top one code even work out for you.
https://www.coursera.org/learn/python-network-data/discussions/forums/G0TMJ6G0EeqqMhL7huUnrQ/threads/Fi07MzG0EeymZRIVts3h3w
import urllib.request
from bs4 import BeautifulSoup
# Prompt user for URL
url = input('Enter URL: ')
# Read HTML from URL
html = urllib.request.urlopen(url).read()
#Parse the HTML using BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
# Find all span tags
tags = soup('span')
# Sum up the numbers
sum = 0
for tag in tags:
sum += int(tag.contents[0])
# Print the sum
print(sum)
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
import re
Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('ENTER URL:') #http://py4e-data.dr-chuck.net/comments_1692181.html
fhand = urllib.request.urlopen(url,context=ctx).read()
soup = BeautifulSoup(fhand,'html.parser')
#print(soup)
Retrieve all of the anchor tags
tags = soup('span')
lst=list()
for tag in tags:
tag = str(tag)
#print(tag)
tag2 = re.findall('[0-9]+',tag)
tag3 = int(tag2[0])
lst.append(tag3)
#print(lst)
total = sum(lst)
print(total)
For window user follow the instruction given by instructor in the discussion forum than the above top one code even work out for you. https://www.coursera.org/learn/python-network-data/discussions/forums/G0TMJ6G0EeqqMhL7huUnrQ/threads/Fi07MzG0EeymZRIVts3h3w
Thank you for your help, It works.
jai essaie sur vs code il y a un probleme trace back et maintenant sur jupyter toujours le resultat 0
uninstall the zip folder and the extracted folder of bs4 and install it using your command prompt by typing: -
pip install beautifulsoup4
Here is the way how you guys can solve this : Working code below 👍 READ ME "":: Copy the actual Data url and run the file from the cmd/terminal and then paste the in terminal or CMD like so
#! /bin/python3 from urllib.request import urlopen from bs4 import BeautifulSoup import ssl
ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE
leave the url empty for now. Paste the url after running the file in cmd or terminal.
url = input("") html = urlopen(url, context=ctx).read() soup = BeautifulSoup(html, "html.parser")
spans = soup('span') numbers = []
for span in spans: numbers.append(int(span.string))
print (sum(numbers))
togithub.mp4
this is the best way to answer. run it in terminal. thank you
This is the error I am getting can anybody help?