Skip to content

Instantly share code, notes, and snippets.

@ytakashina
Last active January 19, 2025 09:15
Show Gist options
  • Save ytakashina/27e9d898f6699d6d0ae6adfa75d8a28d to your computer and use it in GitHub Desktop.
Save ytakashina/27e9d898f6699d6d0ae6adfa75d8a28d to your computer and use it in GitHub Desktop.
import fitz
path_pdf = ...
doc = fitz.open(path_pdf)
for page in doc:
text_raw = page.get_text("text")
line = re.sub(r"\s", "", text_raw)
print(line)
for (x0, y0, x1, y1, text, no, typ) in doc[0].get_text("blocks"):
print(no, re.sub(r"\s", "", text))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment