Skip to content

Instantly share code, notes, and snippets.

@fiorentinogiuseppe
Created January 17, 2020 18:32
Show Gist options
  • Save fiorentinogiuseppe/4d708909c04589e5c310d3a46b2c1bee to your computer and use it in GitHub Desktop.
Save fiorentinogiuseppe/4d708909c04589e5c310d3a46b2c1bee to your computer and use it in GitHub Desktop.
Percorre as paginas do PDF convertendo-o em imagem e realizando a binarização
def get_pages_as_images(file):
"""
Percorre as paginas do PDF convertendo-o em imagem e realizando a binarização
Parameters
----------
file : bytes
Documento em `bytes` contendo o `PDF`.
Returns
-------
PIL.Image.Image
Documento transformado em imagem e binarizada.
"""
images = convert_from_bytes(file, 250, grayscale=True)
num_page = 1
for image in images:
left = image.size[0]*0.05
right = image.size[0]*0.95
top = image.size[1]*0.05
bottom = image.size[1]*0.95
image = image.crop((left, top, right,bottom))
image = binarization(image)
return images
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment