Skip to content

Instantly share code, notes, and snippets.

@cieux1
Created August 7, 2013 06:07
Show Gist options
  • Select an option

  • Save cieux1/6171629 to your computer and use it in GitHub Desktop.

Select an option

Save cieux1/6171629 to your computer and use it in GitHub Desktop.
find htmls which has too big images. # ruby this.rb >> output.csv
require 'uri'
def find_html(img_name, html_arr)
img_basename = File.basename(img_name)
html_arr.each do |file_name|
htmlname = "http://www." + File.dirname(file_name) + "/" + File.basename(file_name)
open(file_name) do |f|
src = f.read
if /(#{img_basename})/ =~ src
puts "#{htmlname}, #{img_name}"
end
end
end
end
html_arr = Dir.glob("path/to/**/*.html")
img_arr = Dir.glob("path/to/**/*.jpg")
img_arr.each do |img|
if File.size(img) > 500000 #500K
find_html(img, html_arr)
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment