Many tools do not fully remove metadata, but just remove the link with in the metadata table. The data are thus still available in the PDF file itself.
While a lot of people rely on Exiftool to remove metadata, it actually does the same in PDFs. If you remove metadata with exiftool -all= some.pdf
, you can always restore the data with exiftool -pdf-update:all= some.pdf
.
There are several options to remove PDF metadata safely:
- Remove metadata with exiftool :
exiftool -all= some.pdf
- Then remove ununsed objects with qpdf :
qpdf --linearize some.pdf - > some.cleaned.pdf
Use MAT2, a python library with a command line tool.
Uses DangerZone, that has a GUI interface for Windows, Mac OS and Linux (but is quite heavy).
(DangerZone is based on formerly pdf-redact-tools which can also be an option)
Thank you for this important warning about Exiftool.
ExifCleaner is based on Exiftool (and nothing else!) and has the same limitations that Exiftool has! The github page of ExifCleaner start by describing it as a "Desktop app to clean metadata from images, videos, PDFs, and other files." However, when you scroll down to the "File writer limitations" section it says that for PDF, "The original metadata is never actually removed."
Don't know about MAT and DangerZone, but the Exiftool with qpdf option will effectively and permanently get rid of "native" Info Tags and XMP data. It will not, however, get rid of other types of "hidden data" that PDFs may contain, such as comments. If you are looking for a GUI solution, there are also commercial options like the "Sanitize document" tool in Adobe Acrobat Pro or the metadata scrubbing app BatchPurifier.