Skip to content

Instantly share code, notes, and snippets.

@denpatin
Created June 3, 2023 06:24
Show Gist options
  • Save denpatin/24360e025c958805b8e3256a71772c5d to your computer and use it in GitHub Desktop.
Save denpatin/24360e025c958805b8e3256a71772c5d to your computer and use it in GitHub Desktop.
Download all MS StyleGuides from https://www.microsoft.com/en-us/language/StyleGuides
# frozen_string_literal: true
source 'https://rubygems.org'
gem 'nokogiri'
gem 'rubocop', require: false
#!/usr/bin/env ruby
# frozen_string_literal: true
require 'nokogiri'
require 'open-uri'
URL = 'https://www.microsoft.com/en-us/language/StyleGuides'
PDF_FOLDER = 'pdf_files'
html = URI.parse(URL).open.read
doc = Nokogiri::HTML(html)
pdf_list = doc.at_css('#ddlStyleGuideLanguage')
options = pdf_list.css('option')
Dir.mkdir(PDF_FOLDER) unless Dir.exist?(PDF_FOLDER)
mutex = Mutex.new
threads = []
options.each do |option|
value = option['value']
next if value.nil? || !value.end_with?('.pdf')
file_name = "#{option.text.strip}.pdf"
file_path = File.join(PDF_FOLDER, file_name)
threads << Thread.new do
mutex.synchronize do
URI.parse(value).open do |pdf_file|
File.open(file_path, 'wb') do |file|
file.write(pdf_file.read)
end
end
end
puts "#{file_name} downloaded successfully."
rescue StandardError => e
warn "Failed to download #{file_name}: #{e.message}"
end
end
threads.each(&:join)
puts 'All PDF files downloaded.'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment