Skip to content

Instantly share code, notes, and snippets.

@lloydchang
Forked from Te-k/pdf_metadata.md
Created September 18, 2024 02:14
Show Gist options
  • Save lloydchang/09900c1ae51899843990e226d791934e to your computer and use it in GitHub Desktop.
Save lloydchang/09900c1ae51899843990e226d791934e to your computer and use it in GitHub Desktop.
How to remove metadata from PDFs

Many tools do not fully remove metadata, but just remove the link with in the metadata table. The data are thus still available in the PDF file itself.

While a lot of people rely on Exiftool to remove metadata, it actually does the same in PDFs. If you remove metadata with exiftool -all= some.pdf, you can always restore the data with exiftool -pdf-update:all= some.pdf.

There are several options to remove PDF metadata safely:

Option 1 : Exiftool with qpdf

  • Remove metadata with exiftool : exiftool -all= some.pdf
  • Then remove ununsed objects with qpdf : qpdf --linearize some.pdf - > some.cleaned.pdf

Option 2 : MAT

Use MAT2, a python library with a command line tool.

Option 3 : DangerZone

Uses DangerZone, that has a GUI interface for Windows, Mac OS and Linux (but is quite heavy).

(DangerZone is based on formerly pdf-redact-tools which can also be an option)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment