Skip to content

Instantly share code, notes, and snippets.

@jeroeningen
Created July 1, 2012 13:35
Show Gist options
  • Save jeroeningen/3028438 to your computer and use it in GitHub Desktop.
Save jeroeningen/3028438 to your computer and use it in GitHub Desktop.
#get the attribute from a given url (image) by using OCR
def ocr_attribute url
dir = "/tmp/scrapers"
file = rand(36**10).to_s(36)
path = "#{dir}/#{file}"
pnm_path = "#{dir}/#{file}.pnm"
open(url) do |f|
Dir.mkdir(dir) if !File.exists? dir
image = Image.from_blob(f.read)[0]
image.write(path)
end
system "convert #{path} #{pnm_path}"
File.delete(path)
attribute = %x(gocr -i #{pnm_path})
File.delete(pnm_path)
attribute.gsub("J", "3").gsub(/[oO]/, "0").gsub(/[-_]/, "").delete(" ").strip
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment