Last active
July 10, 2018 00:38
-
-
Save schappim/8d23ade85c607b9ff701bc3515bd63d3 to your computer and use it in GitHub Desktop.
download_the_image_files.rb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
require 'rubygems' | |
require 'httparty' | |
require 'nokogiri' | |
# Set the URL of the page we want to scrape | |
url = "https://www.kitronik.co.uk/5632-klip-halo-for-the-bbc-microbit.html" | |
# Download the webpage | |
response = HTTParty.get url | |
# Make the raw HTML response a Nokogiri Document | |
doc = Nokogiri::HTML response.body | |
# Get the title | |
title = doc.search('title').inner_text | |
# print out the title | |
puts title | |
puts | |
# Get the description | |
description_html = doc.search('#product_tabs_description_tabbed_contents').first.inner_html.strip | |
# print out the description HTML | |
# puts description_html | |
# Get the links | |
link_nodes = doc.search('.ig_lightbox2') | |
# This returns an array of link nodes | |
# puts link_nodes | |
# Extract the href elements from the link_nodes... | |
link_nodes.each do |node| | |
puts node.attr('href') # Print out the URL | |
# Download the image content and put it into a variable | |
image_content = HTTParty.get node.attr('href') | |
# Get the filename | |
file_name = node.attr('href').split('/').last | |
# print the file name | |
puts file_name | |
# Save the file to disk | |
File.open("./#{file_name}", 'wb') { |file| file.write(image_content) } | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment