The typical approach to comparing PDF files in git diff
outputs
amounts to converting PDF files into texts through pdftotext
and
show the diff of conversion.
- Install the utility
pdftotext
from the projectpoppler
- Enable the handler
pdffiles
indiff
for PDF files and instruct the handler to call thepdf-astextplain
script:
See: https://git-scm.com/docs/gitattributes#_marking_files_as_binaryecho "*.pdf diff=pdffiles" >> ~/.config/git/attributes echo "[diff \"pdffiles\"]\n\ttextconv = pdf-astextplain\n\tbinary = true" >> ~/.gitconfig
and https://git-scm.com/docs/git-diff#Documentation/git-diff.txt---textconv - Add the wrapper of
pdftotext
to direct its output to stdout. Create a scriptpdf-astextplain
in$PATH
:
To compare metadata in addition to the content of the PDF, add#!/bin/sh pdftotext -layout -enc UTF-8 "$1" -
pdfinfo "$1"
inpdf-astextplain
. - In some GIT implementations, there is a
astextplain
script that converts PDF and other files to text fordiff
as well.
See: https://github.com/git-for-windows/build-extra/blob/c223c7757745c1df552c0dd4628c368aaea11f32/git-extra/astextplain - In similar spirit, use
zipinfo -l
to show contents of ZIP archive and use the scriptracket-wxme-astext
to show Racket WXME files.