Skip to content

Instantly share code, notes, and snippets.

@Romern
Created January 26, 2021 10:56
Show Gist options
  • Save Romern/916b377a9bde569eb7d4386cb40713b0 to your computer and use it in GitHub Desktop.
Save Romern/916b377a9bde569eb7d4386cb40713b0 to your computer and use it in GitHub Desktop.
Script which prints all pdf objects structured in plain text. Great for grep-ing URLs.
import pikepdf
import sys
input = sys.argv[1]
with pikepdf.open(input) as pdf:
for page in pdf.pages:
print(str(dict(page.as_dict())))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment