Skip to content

Instantly share code, notes, and snippets.

@guyrt
Created January 26, 2022 16:45
Show Gist options
  • Save guyrt/3bf8cb56cc94b533cabd9dc2a1763b6c to your computer and use it in GitHub Desktop.
Save guyrt/3bf8cb56cc94b533cabd9dc2a1763b6c to your computer and use it in GitHub Desktop.
Copy a PDF without any extra data.
# Simple python script that will copy content from 1 pdf to a new pdf.
# FYI I just copied from https://github.com/pmaupin/pdfrw/blob/master/examples/cat.py while I was playing around.
import sys
import os
from pdfrw import PdfReader, PdfWriter
reader = PdfReader(sys.argv[1])
pages = reader.pages
writer = PdfWriter()
writer.addpages(pages)
writer.write("trim" + os.path.basename(sys.argv[1]))
'''
These PDFs differ by one piece of metadata:
$ diff --text 1-s2.0-S0166864121003783-main.pdf 2-s2.0-S0166864121003783-main.pdf
2765c2765
< <tprDsz.uSntuRm9iGoguJnLNNz.qQlt6Jm.iGn9j-mPz-z.eSndb_o9ePndmLmtmNmMmTma/>
---
> <ZtV1wzduMmPuLnweGogqNy8NNnwyOlwf8m.eGm9z8zdyRoweKnMuLo9ePndmLmtmQmMmTma/>
$ python .\cleanpdf.py .\1-s2.0-S0166864121003783-main.pdf
$ python .\cleanpdf.py .\2-s2.0-S0166864121003783-main.pdf
$ diff --text trim1-s2.0-S0166864121003783-main.pdf trim2-s2.0-S0166864121003783-main.pdf
Look mom! No diff!
'''
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment