Last active
June 6, 2017 06:13
-
-
Save willf/e58d60f60d6b4f2f199c33ec766c5429 to your computer and use it in GitHub Desktop.
Follow redirect links
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'net/https' | |
require 'uri' | |
=begin | |
This simple module provides two methods to follow, and return, the redirects of a URL | |
It goes to a depth of 10 unless specified otherwise. | |
Redirect.redirect_urls(<url>) returns a dictionary with the following keys: | |
:completed : true if reached final direct before hitting limit | |
:uris : a list of URI structures. The first will be the final destination | |
:urls: a string version of the above | |
:hosts: the unique hosts of the URIs | |
Redirect.resolve(<url>) returns the last resolved URL. | |
Redirect.redirect_urls("entish.org") => dictionary as above | |
Redirect.resolve("entish.org") => "http://entish.org" | |
By design, this doesn't do much error checking, though it does try | |
to add the Scheme ("http") and reuse URL information on relocations. | |
It also uses HEAD method, and sets a User-agent of "Ruby redirect script" | |
=end | |
module Redirect | |
def self.redirect_urls(url, options = {}) | |
redirect_lookup_depth = options[:depth].to_i > 0 ? options[:depth].to_i : 10 | |
current_uri = URI.parse(url) | |
current_uri = URI.parse('http://' + url) if current_uri.scheme.nil? | |
redirs = get_redirects(current_uri, [current_uri], redirect_lookup_depth, redirect_lookup_depth) | |
redirs[:urls] = redirs[:uris].map(&:to_s) | |
redirs[:hosts] = redirs[:uris].map(&:host).uniq | |
redirs | |
end | |
def self.resolve(url) | |
self.redirect_urls(url)[:urls][0] | |
end | |
private | |
def self.get_redirects(current_uri, uris, limit, limit_count) | |
return { completed: false, uris: uris } if limit_count < 1 | |
http = Net::HTTP.new(current_uri.host, current_uri.port) | |
http.use_ssl = true if current_uri.scheme == 'https' | |
request = Net::HTTP::Head.new(current_uri.request_uri) | |
request.initialize_http_header('User-Agent' => 'Ruby redirect script') | |
response = http.request(request) | |
case response | |
when Net::HTTPSuccess then | |
return { completed: true, uris: uris } | |
when Net::HTTPRedirection then | |
redirect_location = response['location'] | |
location_uri = URI.parse(redirect_location) | |
if location_uri.host.nil? | |
location_uri = URI.parse(uri.scheme + '://' + uri.host + redirect_location) | |
end | |
# puts("Redirecting from #{current_uri} to #{location_uri}") | |
get_redirects(location_uri, [location_uri] + uris, limit, limit - 1) | |
else | |
raise 'Non-success/redirect response: ' + response.inspect | |
end | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment