Skip to content

Instantly share code, notes, and snippets.

@chinasaur
Last active October 3, 2023 05:00
Show Gist options
  • Save chinasaur/dc94fd2937c3b25ea814ef79c75bfa45 to your computer and use it in GitHub Desktop.
Save chinasaur/dc94fd2937c3b25ea814ef79c75bfa45 to your computer and use it in GitHub Desktop.
Remove paperpile hyperlinks from PDF
# Basically seems to work to strip all paperpile links from the text
# and references section, while leaving rest of links intact.
import pdfrw # https://github.com/pmaupin/pdfrw/tree/master/pdfrw
pdf = pdfrw.PdfReader('input_path.pdf')
for page in pdf.pages:
if not page.Annots:
continue
nopaperpile = pdfrw.objects.pdfarray.PdfArray()
for annot in page.Annots:
if not (annot.A.URI.startswith('(https://paperpile.com/') or
annot.A.URI.startswith('(http://paperpile.com/')):
nopaperpile.append(annot)
page.Annots = nopaperpile
output = pdfrw.PdfWriter('output_path.pdf')
for page in pdf.pages:
output.addPage(page)
output.write()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment