As websites become more JavaScript heavy, it's harder to automate things like screenshotting for archival purposes. I've seen examples and suggestions to use PhantomJS for visual testing/archiving of websites, but have run into issues such as the non-rendering of webfonts. I've never tried out Selenium until today...and while I'm not thinking about performance implications yet, Selenium seems far more accurate than PhantomJS...which makes sense since it actually opens a real browser. And it's not too hard to script to do complex interactions: here's an example of how to log in to Twitter, write a tweet, upload an image, and send a tweet via Selenium and DOM element selection...Obviously, you shouldn't be automating Twitter via browser when the API and tweepy are so much better for that, though it is fun to watch the browser go through the steps without you touching a thing.
The example snippet below, which is not particularly well coded, opens up YouTube's homepage and clunkily scrolls to the bottom, triggering the AJAX events that load video previews below the browser fold. It then "clicks" the Load more button, scrolls to the bottom, then scrolls back up before taking a screenshot of the entire page:
(note: I realize my arithmetic is crap. oh well)
from selenium import webdriver
from time import sleep
driver = webdriver.Firefox()
driver.get("https://www.youtube.com")
# scroll some more
for isec in (4, 3, 2, 1):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight / %s);" % isec)
sleep(1)
# load more
sleep(2)
print("push Load more...")
driver.find_element_by_css_selector('button.load-more-button').click()
print("wait a bit...")
sleep(2)
print("Jump to the bottom, work our way back up")
for isec in (1, 2, 3, 4, 5):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight / %s);" % isec)
sleep(1)
driver.execute_script("window.scrollTo(0, 0)")
print("Pausin a bit...")
sleep(2)
print("Scrollin to the top so that the nav bar isn't funny looking")
driver.execute_script("window.scrollTo(0, 0);")
sleep(1)
print("Screenshotting...")
# screenshot
driver.save_screenshot("/tmp/youtube.com.jpg")
Firefox crashes when trying to screenshot a page as big as Bloomberg's What is Code? Installing the chromedriver to run Chrome mitigates part of the issue...however, Chrome only captures the viewport:
(partial code in progress)
from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()
driver.implicitly_wait(5) # this is the preferred way to wait for things
driver.get("http://www.bloomberg.com/graphics/2015-paul-ford-what-is-code/")
driver.save_screenshot("/tmp/bloomberg-what-is-code.com.png")
# # http://stackoverflow.com/questions/30648765/screen-capture-error-what-does-it-mean
# # brew install chromedriver
# # scroll some more
# for n in range(30):
# inc = round((n + 1) / 30, 2)
# driver.execute_script("window.scrollTo(0, document.body.scrollHeight * %s);" % inc)
# sleep(0.2)
# # work our way up
# for n in range(5):
# inc = round((5 - (n + 1)) / 5, 2)
# driver.execute_script("window.scrollTo(0, document.body.scrollHeight * %s);" % inc)
# sleep(0.2)
# sleep(1)
# print("Screenshotting...")
# # screenshot
# driver.save_screenshot("/tmp/bloomberg-what-is-code.com.png")