Created
December 20, 2023 18:29
-
-
Save mkasberg/bfca90dbec79687d31a0476166039007 to your computer and use it in GitHub Desktop.
A Ruby script to find broken links
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
require 'net/http' | |
require 'timeout' | |
# A Ruby script to find broken links. | |
# | |
# Not perfect, but it gets the job done. | |
# The output from STDOUT can be used as a CSV: | |
# | |
# ruby find_broken_links.rb > broken_links.csv | |
DIR = './_posts/' | |
SEARCH = 'https?:\/\/[a-zA-Z0-9!#$%&_+=,.?\/\-]+' | |
results = `grep -E -o -r '#{SEARCH}' '#{DIR}'` | |
results.split("\n").map(&:strip).each do |line| | |
filename, url = line.split(':', 2) | |
$stderr.puts "Checking: #{url}" | |
begin | |
Timeout.timeout(2) do | |
res = Net::HTTP.get_response(URI(url)) | |
unless res.is_a?(Net::HTTPSuccess) | |
puts "#{res.code},\"#{filename}\",\"#{url}\"" | |
$stderr.puts "BROKEN: #{res.code} #{url} #{filename}" | |
end | |
end | |
rescue => e | |
$stderr.puts "Skipping due to error!" | |
$stderr.puts e | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment