Skip to content

Instantly share code, notes, and snippets.

@ionox0
Created January 6, 2019 18:26
Show Gist options
  • Save ionox0/aac0068eb990b919e017a1edb673e8c6 to your computer and use it in GitHub Desktop.
Save ionox0/aac0068eb990b919e017a1edb673e8c6 to your computer and use it in GitHub Desktop.
Simple python script to find and replace text within a PDF
import re
import sys
import zlib
# Module to find and replace text in PDF files
#
# Usage:
# python pdf_replace.py <input_filename> <text_to_find> <text_to_replace> <output_filename>
#
# @author Ionox0
input_filename = sys.argv[1]
text_to_find = sys.argv[2]
text_to_replace = sys.argv[3]
output_filename sys.argv[4]
pdf = open(input_filename, "rb").read()
# Create a copy of the PDF content to make edits to
pdf_copy = pdf[0:]
# Search for stream objects with text to replace
stream = re.compile(r'.*?FlateDecode.*?stream(.*?)endstream', re.S)
for s in stream.findall(pdf):
s = s.strip('\r\n')
try:
text = zlib.decompress(s)
if text_to_find in text:
print('Found match:')
print(text)
text = text.replace(text_to_find, text_to_replace)
pdf_copy = pdf_copy.replace(s, zlib.compress(text))
except:
pass
with open(output_filename, 'wb') as out:
out.write(pdf_copy)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment