Last active
September 23, 2022 17:38
-
-
Save philshem/10099302 to your computer and use it in GitHub Desktop.
Scrape the number of pages in a book from Amazon.com
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Add links to urllist for more pages. | |
# Code can be expanded to scrape more. | |
import requests | |
from bs4 import BeautifulSoup | |
urllist = [ | |
'http://www.amazon.com/Flash-Boys-Wall-Street-Revolt/dp/0393244660', | |
'http://www.amazon.com/The-Big-Short-Doomsday-Machine/dp/0393338827' | |
] | |
for url in urllist: | |
r = requests.get(url) | |
soup = BeautifulSoup(r.text) | |
tmp = '' | |
for line in soup.get_text().split(): | |
if line.lower() == 'pages' and tmp.isdigit(): | |
print tmp,line, ' - ',soup.html.head.title.text | |
else: | |
tmp = line |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment