Skip to content

Instantly share code, notes, and snippets.

@bchase
Created April 6, 2016 01:37
Show Gist options
  • Select an option

  • Save bchase/3fa64a0bf69ea5ee6a59585e5f30b33e to your computer and use it in GitHub Desktop.

Select an option

Save bchase/3fa64a0bf69ea5ee6a59585e5f30b33e to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
# require 'virtus'
# require 'uri'
# require 'net/http'
# require 'json'
# require 'ostruct'
# # require 'active_support/core_ext/klass'
require 'pathname'
require 'pry'
module Kernel
alias :pn :Pathname
end
orig_img_path = ARGV[0]
BOTTOM_OUTPUT_DIR = "cropped_imgs/bottom"
LEFT_OUTPUT_DIR = "cropped_imgs/left"
"mkdir -p #{BOTTOM_OUTPUT_DIR} #{LEFT_OUTPUT_DIR}"
output_dir = BOTTOM_OUTPUT_DIR
ocr_img_path = pn(output_dir).join("ocr-read-#{orig_img_path}")
### crop (`mogrify` -> `convert` -> imagemagick)
# # left
# width, height = 600, 350
# dimensions = "#{width}x#{height}"
# right_x, down_y = 10, 335
# # bottom
width, height = 1920, 69
dimensions = "#{width}x#{height}"
right_x, down_y = 0, 1011
crop_arg = %[ #{dimensions}+#{right_x}+#{down_y}\! ]
%x[ mogrify -path #{output_dir} -crop #{crop_arg} #{orig_img_path} ]
### flush out white text
from_color = "#CFCFCF"
to_color = "black"
%x[ convert #{output_dir}/#{orig_img_path} -fill "#{to_color}" -fuzz 10% +opaque "#{from_color}" #{ocr_img_path} ]
### OCR (psm 7 = treat as one line)
# TODO don't use psm7 for left
txt = %x[ tesseract #{ocr_img_path} stdout -l jpn+eng -psm 7 ].strip
puts txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment