Skip to content

Instantly share code, notes, and snippets.

@Te-k
Created November 26, 2020 10:31
Show Gist options
  • Save Te-k/e504d9586377c36c75ba131c4280f5fb to your computer and use it in GitHub Desktop.
Save Te-k/e504d9586377c36c75ba131c4280f5fb to your computer and use it in GitHub Desktop.
How to remove metadata from PDFs

Many tools do not fully remove metadata, but just remove the link with in the metadata table. The data are thus still available in the PDF file itself.

While a lot of people rely on Exiftool to remove metadata, it actually does the same in PDFs. If you remove metadata with exiftool -all= some.pdf, you can always restore the data with exiftool -pdf-update:all= some.pdf.

There are several options to remove PDF metadata safely:

Option 1 : Exiftool with qpdf

  • Remove metadata with exiftool : exiftool -all= some.pdf
  • Then remove ununsed objects with qpdf : qpdf --linearize some.pdf - > some.cleaned.pdf

Option 2 : MAT

Use MAT2, a python library with a command line tool.

Option 3 : DangerZone

Uses DangerZone, that has a GUI interface for Windows, Mac OS and Linux (but is quite heavy).

(DangerZone is based on formerly pdf-redact-tools which can also be an option)

@Moon1moon
Copy link

Hi, do you know how good this tool for removing metadata?
https://github.com/szTheory/exifcleaner

@qeqteq
Copy link

qeqteq commented Dec 15, 2024

While a lot of people rely on Exiftool to remove metadata, it actually does the same in PDFs. If you remove metadata with exiftool -all= some.pdf, you can always restore the data with exiftool -pdf-update:all= some.pdf.

Thank you for this important warning about Exiftool.

Hi, do you know how good this tool for removing metadata? https://github.com/szTheory/exifcleaner

ExifCleaner is based on Exiftool (and nothing else!) and has the same limitations that Exiftool has! The github page of ExifCleaner start by describing it as a "Desktop app to clean metadata from images, videos, PDFs, and other files." However, when you scroll down to the "File writer limitations" section it says that for PDF, "The original metadata is never actually removed."

Don't know about MAT and DangerZone, but the Exiftool with qpdf option will effectively and permanently get rid of "native" Info Tags and XMP data. It will not, however, get rid of other types of "hidden data" that PDFs may contain, such as comments. If you are looking for a GUI solution, there are also commercial options like the "Sanitize document" tool in Adobe Acrobat Pro or the metadata scrubbing app BatchPurifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment