Created
March 27, 2019 19:27
-
-
Save jystewart/d1f3a5aa5fa9c7526cdce0875c119962 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'fileutils' | |
url_file = "urlfile.txt" | |
def sanitize_filename(filename) | |
name = filename.strip | |
# NOTE: File.basename doesn't work right with Windows paths on Unix | |
# get only the filename, not the whole path | |
name.gsub!(/^.*(\\|\/)/, '') | |
# Strip out the non-ascii character | |
name.gsub!(/[^0-9A-Za-z.\-]/, '_') | |
return name | |
end | |
File.readlines(url_file).each do |line| | |
path, query = line.split("?") | |
path_parts = path.split("/") | |
filename = File.basename(path, ".json") + "_" + sanitize_filename(query) | |
path = File.dirname(path) | |
FileUtils.mkdir_p("s3#{path}") unless File.exists?("s3/#{path}") | |
cmd = "curl -o s3#{path}/#{filename}.json https://www.ncsc.gov.uk/#{line}" | |
value = `#{cmd}` | |
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/api/1/services/v1/collection-content.json?url=/collection/board-toolkit&pageContentUrl=/collection/board-toolkit |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment