Skip to content

Instantly share code, notes, and snippets.

@schappim
Last active July 10, 2018 00:38
Show Gist options
  • Save schappim/8d23ade85c607b9ff701bc3515bd63d3 to your computer and use it in GitHub Desktop.
Save schappim/8d23ade85c607b9ff701bc3515bd63d3 to your computer and use it in GitHub Desktop.
download_the_image_files.rb
#!/usr/bin/env ruby
require 'rubygems'
require 'httparty'
require 'nokogiri'
# Set the URL of the page we want to scrape
url = "https://www.kitronik.co.uk/5632-klip-halo-for-the-bbc-microbit.html"
# Download the webpage
response = HTTParty.get url
# Make the raw HTML response a Nokogiri Document
doc = Nokogiri::HTML response.body
# Get the title
title = doc.search('title').inner_text
# print out the title
puts title
puts
# Get the description
description_html = doc.search('#product_tabs_description_tabbed_contents').first.inner_html.strip
# print out the description HTML
# puts description_html
# Get the links
link_nodes = doc.search('.ig_lightbox2')
# This returns an array of link nodes
# puts link_nodes
# Extract the href elements from the link_nodes...
link_nodes.each do |node|
puts node.attr('href') # Print out the URL
# Download the image content and put it into a variable
image_content = HTTParty.get node.attr('href')
# Get the filename
file_name = node.attr('href').split('/').last
# print the file name
puts file_name
# Save the file to disk
File.open("./#{file_name}", 'wb') { |file| file.write(image_content) }
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment