Skip to content

Instantly share code, notes, and snippets.

@rinogo
Created March 31, 2021 18:50
Show Gist options
  • Save rinogo/294e723ac9e53c23d131e5852312dfe8 to your computer and use it in GitHub Desktop.
Save rinogo/294e723ac9e53c23d131e5852312dfe8 to your computer and use it in GitHub Desktop.
Optimally resize an image so that its line height is approximately 32 pixels (Keywords: OpenCV, Tesseract, OCR)
#Optimally resize `img` according to the bounding boxes specified in `boxes` (which is simply the (pruned) results from `pytesseract.image_to_data()`).
#Tesseract performs optimally when capital letters are ~32px tall (https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ). Smaller text obviously can't be OCR'd as accurately, but weirdly enough, larger text causes problems as well. So, this function uses the bounding boxes we've found and resizes the image so that the median line height should be ~32px.
def optimal_resize(img, boxes):
median_height = np.median(boxes["height"])
target_height = 32 #See https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/24JHDYQbBQAJ
scale_factor = target_height / median_height
print("Scale factor: " + str(scale_factor))
#If the image is already within `skip_percentage` percent of the target size, just return the original image (it's better to skip resizing if we can)
skip_percentage = 0.07
if(scale_factor > 1 - skip_percentage and scale_factor < 1 + skip_percentage):
return img
#Bicubic for enlarging, "pixel area relation" for reduction. (See https://chadrick-kwag.net/cv2-resize-interpolation-methods/)
if(scale_factor > 1.0):
interpolation = cv2.INTER_CUBIC
else:
interpolation = cv2.INTER_AREA
return cv2.resize(img, None, fx = scale_factor, fy = scale_factor, interpolation = interpolation)
@rinogo
Copy link
Author

rinogo commented Mar 31, 2021

MIT License. Let me know if you use this! :)

@Bogdan740
Copy link

Hey, I'm using this for a university project :D What's the best way to cite?

@rinogo
Copy link
Author

rinogo commented Apr 17, 2024

Cool! I'll leave it up to you! :)

@maxfil333
Copy link

Hey, could you answer the question of how your program determines capital letters? If there are more regular (non-capital) letters in the image, the median will be calculated for them, and not for capital letters. thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment