Created
July 2, 2015 06:26
-
-
Save Kasahs/27890a48ff129ef648ac to your computer and use it in GitHub Desktop.
Use selenium with phantomjs (with custom capabilities) for screen scraping
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from selenium import webdriver | |
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities | |
from bs4 import BeautifulSoup | |
# edit desired capabilities | |
dcap = dict(DesiredCapabilities.PHANTOMJS) | |
dcap["phantomjs.page.settings.userAgent"] = ( | |
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 " | |
"(KHTML, like Gecko) Chrome/15.0.87" | |
) | |
dcap['pahntomjs.page.settings.loadImages'] = False | |
driver = webdriver.PhantomJS('/path/to/bin/phantomjs', desired_capabilities=dcap) | |
driver.get('http://scarpe.this.url/please') | |
soup = BeautifulSoup(driver.page_source) | |
# do stuff with your soup | |
# useful links | |
# https://coderwall.com/p/9jgaeq/set-phantomjs-user-agent-string | |
# http://stackoverflow.com/a/15699761 # phantomjs + selenium example | |
# http://stackoverflow.com/a/6300672 # link for using selenium with xvfb (virtual display) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment