Skip to content

Instantly share code, notes, and snippets.

@chrdek
Last active April 17, 2024 16:33
Show Gist options
  • Save chrdek/7ac16de0f5c5b8f3f261a803d9233553 to your computer and use it in GitHub Desktop.
Save chrdek/7ac16de0f5c5b8f3f261a803d9233553 to your computer and use it in GitHub Desktop.
PDF files content redaction by usage of ImageMagick
<#
Author: Chris Dek.
.Synopsis
This is a powershell utility script that redacts content (via black rounded rectangles) based on an image coordinate file.
Details for usage are shown in the sections below.
.Description
PDF file Content Redaction with usage of Imagemagick.This file generates the necessary ImageMagick commands to convert a set of images extracted from a pdf
and the coordinates that determine where the redaction takes place.
.Parameter coordfile
The file which holds the coordinate sets per PDF page for redaction.
The coordinates file format is as shown below (NOTE==> 0,0 is the top-left corner of the image canvas):
X1,Y1 X2,Y2 <--redaction 1 -page1
X1,Y1 X2,Y2 <--redaction 2 -page1
+ <--add a page
X1,Y1 X2,Y2 <--redaction 1 -page2
X1,Y1 X2,Y2 <--redaction 2 -page2
+....and so on..
.Parameter imagesloc
The directory that includes all extracted images from the pdf to be redacted and converted back to the initial file.
.Example
Set the proper parameters in the last line of PdfCoordPlacement function call.
coordfile = { The coordinates file location "Current input file is: pdfpoints.xyxy" }
imagesloc = { The directory that holds the extracted images from PDF to be modified }
Run the following from the PS command line in the format shown below:
.\PDF-Redact.ps1
After running this, the batch file command will run and the redacted PDF file will be opened for viewing.
#>
Function PdfCoordPlacement {
param(
[Parameter(Mandatory=$true)]
[string]$coordfile,
[Parameter(Mandatory=$true)]
[string]$imagesloc)
##Initial Directory cleanup..
if (Test-Path -PathType Leaf $env:USERPROFILE'\Documents\pdfout.bat') { rm pdfout.bat }
if (Test-Path -PathType Leaf $env:USERPROFILE'\Documents\out-pdfredacted.pdf') { rm out-pdfredacted.pdf }
Set-Content -LiteralPath $env:USERPROFILE'\Documents\pdfout.bat' -Value ""
$linecont=@([PSObject]@{redactnum=0; content=""})
$cnt=0; $coordcount = (Get-Content -Path $env:USERPROFILE\'Documents\pdfpoints.xyxy').Count; $objCount=0
$lines= Get-Content -Path $coordfile | Where {$_ -notlike "+*"} | Measure-Object -Line
[PSCustomObject]$filedata= @();$n=0
Get-ChildItem -Path $imagesloc | %{
$fileprops = @{
filename= $_.BaseName
ext= $_.Extension
}
$filedata+= New-Object PSCustomObject -Property $fileprops
}
[string[]]$str=@()
Get-Content -Path $coordfile | ForEach-Object {
if ($cnt -eq $coordcount) {break}
else {
$linecont+= $linecont
$linecont[$cnt].redactnum+= $lines.Lines
$str+= $_
if ($_ -like "+*") {
$cnt++
$initredact = 'convert "%USERPROFILE%\Documents\filespdf\images\{0}{1}" -draw "fill black roundrectangle {2} 10,10" "%USERPROFILE%\Documents\filespdf\images\{3}redacted{4}"' -f $filedata[$n].filename,$filedata[0].ext,$strreplace,$filedata[$n].filename,$filedata[0].ext+"`r`n"
Add-Content -LiteralPath $env:USERPROFILE'\Documents\pdfout.bat' -Value $initredact
foreach ($line in $str) {
$frmstr= "{0}" -f $line #$linecont[$cnt].content
$strreplace = $frmstr -replace "\+","0,0 0,0"
$redactcmd = 'convert "%USERPROFILE%\Documents\filespdf\images\{0}redacted{1}" -draw "fill black roundrectangle {2} 10,10" "%USERPROFILE%\Documents\filespdf\images\{3}redacted{4}"' -f $filedata[$n].filename,$filedata[0].ext,$strreplace,$filedata[$n].filename,$filedata[0].ext+"`r`n"
Add-Content -LiteralPath $env:USERPROFILE'\Documents\pdfout.bat' -Value $redactcmd
}
$str=@()
$n++
Write-Host "Num of total redactions $($lines.Lines) page:$($cnt)"
}
}
}
$finalcmd = 'convert "%USERPROFILE%\Documents\filespdf\images\*redacted{0}" "%USERPROFILE%\Documents\filespdf\images\out-pdfredacted.pdf"' -f $filedata[0].ext
Add-Content -LiteralPath $env:USERPROFILE'\Documents\pdfout.bat' -Value $finalcmd
Add-Content -LiteralPath $env:USERPROFILE'\Documents\pdfout.bat' -Value 'del /Q "%USERPROFILE%\Documents\filespdf\images\*redacted.png"'
Add-Content -LiteralPath $env:USERPROFILE'\Documents\pdfout.bat' -Value '"%USERPROFILE%\Documents\filespdf\images\out-pdfredacted.pdf"'
Invoke-Item $env:USERPROFILE'\Documents\pdfout.bat'
}
PdfCoordPlacement -coordfile $env:USERPROFILE'\Documents\pdfpoints.xyxy' -imagesloc $env:USERPROFILE'\Documents\filespdf\images\*.png'
511.5,308.0 649.0,352.0
660.0,407.0 709.5,429.0
665.5,478.5 720.5,517.0
+
368.5,792.0 649.0,874.5
357.5,308.0 440.0,357.5
550.0,429.0 638.0,473.0
+
660.0,429.0 1012.0,511.5
671.0,599.5 924.0,660.0
836.0,368.5 962.5,418.0
1188.0,368.5 1292.5,412.5
+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment