Created
September 12, 2021 13:59
-
-
Save arturosalgado/c83c46fc7a3b6e98c6c718170229dc57 to your computer and use it in GitHub Desktop.
Scrap the web, works on sites with javascript-created dom items. Ubuntu Linux version.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sudo pip3 install requests | |
>sudo: pip3: command not found | |
sudo apt install python3-pip | |
pip3 install requests-html. | |
>pyppeteer.errors.BrowserError: Browser closed unexpectedly: | |
sudo apt install -y gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget | |
python.py | |
from requests_html import HTMLSession | |
session = HTMLSession(); | |
URL ='url-which-creates-content-dynamically-with-js.com' | |
r = session.get(URL) | |
r.html.render(sleep=2,keep_page= True,scrolldown=1) | |
items = r.html.find('span.class') | |
for item in items: | |
print(item.text) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment