Skip to content

Instantly share code, notes, and snippets.

@ivan3bx
Created February 9, 2024 17:33
Show Gist options
  • Save ivan3bx/b974b4e508a1391c1bf5bfedc03420fa to your computer and use it in GitHub Desktop.
Save ivan3bx/b974b4e508a1391c1bf5bfedc03420fa to your computer and use it in GitHub Desktop.
Ruby script to generate screenshots for a series of URLs crawled from a sitemap.xml
require "selenium-webdriver"
require "nokogiri"
require "net/http"
BASE_URL = "http://localhost:1313"
#
# This expects several things to be true:
#
# 1. Local hugo instance running at localhosts:1313
# 2. Sitemap at http://localhost:1313/sitemap.xml
# 3. Posts have alternate output URLs ending with "/small.html" (e.g. "<post_url>/small.html")
# 4. Posts have a single, top-level <article> element which will be used to calculate height of screenshot.
# 5. Local directory "./images" exists.
#
def generate!
urls = extract_urls(BASE_URL, %r{/posts/})
urls.each do |url|
url = URI.join(url, "small.html").to_s
driver.get url
begin
driver.find_element(css: "article")
rescue Selenium::WebDriver::Error::NoSuchElementError
next
end
# hide any scrollbars when taking the screenshot
driver.execute_script("return document.body.style.overflow = 'hidden';")
article_height = driver.execute_script <<~JS
var articleHeight = document.querySelector("article").clientHeight
var offsetTop = document.querySelector("article").offsetTop
return (articleHeight + offsetTop + offsetTop)
JS
driver.manage.window.resize_to(1000, article_height)
basename = URI.parse(url).path.split("/")[1..-2].join("_")
driver.save_screenshot("./images/#{basename}.png")
puts "Processed #{url}"
end
driver.quit
end
def extract_urls(base_url, pattern)
sitemap_url = URI.join(base_url, "/sitemap.xml")
body = Net::HTTP.get(sitemap_url)
Nokogiri::XML(body).css("url loc")&.map(&:text)&.select { |url| url =~ pattern }
end
def driver
@driver ||= begin
options = Selenium::WebDriver::Chrome::Options.new(args: ["headless"])
Selenium::WebDriver.for(:chrome, options: options)
end
end
# this is only useful when running non-headless, in which case
# this value should be added to the clientHeight of the article
def height_offset(driver)
driver.execute_script <<~JS
return (window.outerHeight - window.innerHeight)
JS
end
# Generates all previews
generate!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment