Last active
April 17, 2024 16:33
-
-
Save chrdek/7ac16de0f5c5b8f3f261a803d9233553 to your computer and use it in GitHub Desktop.
PDF files content redaction by usage of ImageMagick
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<# | |
Author: Chris Dek. | |
.Synopsis | |
This is a powershell utility script that redacts content (via black rounded rectangles) based on an image coordinate file. | |
Details for usage are shown in the sections below. | |
.Description | |
PDF file Content Redaction with usage of Imagemagick.This file generates the necessary ImageMagick commands to convert a set of images extracted from a pdf | |
and the coordinates that determine where the redaction takes place. | |
.Parameter coordfile | |
The file which holds the coordinate sets per PDF page for redaction. | |
The coordinates file format is as shown below (NOTE==> 0,0 is the top-left corner of the image canvas): | |
X1,Y1 X2,Y2 <--redaction 1 -page1 | |
X1,Y1 X2,Y2 <--redaction 2 -page1 | |
+ <--add a page | |
X1,Y1 X2,Y2 <--redaction 1 -page2 | |
X1,Y1 X2,Y2 <--redaction 2 -page2 | |
+....and so on.. | |
.Parameter imagesloc | |
The directory that includes all extracted images from the pdf to be redacted and converted back to the initial file. | |
.Example | |
Set the proper parameters in the last line of PdfCoordPlacement function call. | |
coordfile = { The coordinates file location "Current input file is: pdfpoints.xyxy" } | |
imagesloc = { The directory that holds the extracted images from PDF to be modified } | |
Run the following from the PS command line in the format shown below: | |
.\PDF-Redact.ps1 | |
After running this, the batch file command will run and the redacted PDF file will be opened for viewing. | |
#> | |
Function PdfCoordPlacement { | |
param( | |
[Parameter(Mandatory=$true)] | |
[string]$coordfile, | |
[Parameter(Mandatory=$true)] | |
[string]$imagesloc) | |
##Initial Directory cleanup.. | |
if (Test-Path -PathType Leaf $env:USERPROFILE'\Documents\pdfout.bat') { rm pdfout.bat } | |
if (Test-Path -PathType Leaf $env:USERPROFILE'\Documents\out-pdfredacted.pdf') { rm out-pdfredacted.pdf } | |
Set-Content -LiteralPath $env:USERPROFILE'\Documents\pdfout.bat' -Value "" | |
$linecont=@([PSObject]@{redactnum=0; content=""}) | |
$cnt=0; $coordcount = (Get-Content -Path $env:USERPROFILE\'Documents\pdfpoints.xyxy').Count; $objCount=0 | |
$lines= Get-Content -Path $coordfile | Where {$_ -notlike "+*"} | Measure-Object -Line | |
[PSCustomObject]$filedata= @();$n=0 | |
Get-ChildItem -Path $imagesloc | %{ | |
$fileprops = @{ | |
filename= $_.BaseName | |
ext= $_.Extension | |
} | |
$filedata+= New-Object PSCustomObject -Property $fileprops | |
} | |
[string[]]$str=@() | |
Get-Content -Path $coordfile | ForEach-Object { | |
if ($cnt -eq $coordcount) {break} | |
else { | |
$linecont+= $linecont | |
$linecont[$cnt].redactnum+= $lines.Lines | |
$str+= $_ | |
if ($_ -like "+*") { | |
$cnt++ | |
$initredact = 'convert "%USERPROFILE%\Documents\filespdf\images\{0}{1}" -draw "fill black roundrectangle {2} 10,10" "%USERPROFILE%\Documents\filespdf\images\{3}redacted{4}"' -f $filedata[$n].filename,$filedata[0].ext,$strreplace,$filedata[$n].filename,$filedata[0].ext+"`r`n" | |
Add-Content -LiteralPath $env:USERPROFILE'\Documents\pdfout.bat' -Value $initredact | |
foreach ($line in $str) { | |
$frmstr= "{0}" -f $line #$linecont[$cnt].content | |
$strreplace = $frmstr -replace "\+","0,0 0,0" | |
$redactcmd = 'convert "%USERPROFILE%\Documents\filespdf\images\{0}redacted{1}" -draw "fill black roundrectangle {2} 10,10" "%USERPROFILE%\Documents\filespdf\images\{3}redacted{4}"' -f $filedata[$n].filename,$filedata[0].ext,$strreplace,$filedata[$n].filename,$filedata[0].ext+"`r`n" | |
Add-Content -LiteralPath $env:USERPROFILE'\Documents\pdfout.bat' -Value $redactcmd | |
} | |
$str=@() | |
$n++ | |
Write-Host "Num of total redactions $($lines.Lines) page:$($cnt)" | |
} | |
} | |
} | |
$finalcmd = 'convert "%USERPROFILE%\Documents\filespdf\images\*redacted{0}" "%USERPROFILE%\Documents\filespdf\images\out-pdfredacted.pdf"' -f $filedata[0].ext | |
Add-Content -LiteralPath $env:USERPROFILE'\Documents\pdfout.bat' -Value $finalcmd | |
Add-Content -LiteralPath $env:USERPROFILE'\Documents\pdfout.bat' -Value 'del /Q "%USERPROFILE%\Documents\filespdf\images\*redacted.png"' | |
Add-Content -LiteralPath $env:USERPROFILE'\Documents\pdfout.bat' -Value '"%USERPROFILE%\Documents\filespdf\images\out-pdfredacted.pdf"' | |
Invoke-Item $env:USERPROFILE'\Documents\pdfout.bat' | |
} | |
PdfCoordPlacement -coordfile $env:USERPROFILE'\Documents\pdfpoints.xyxy' -imagesloc $env:USERPROFILE'\Documents\filespdf\images\*.png' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
511.5,308.0 649.0,352.0 | |
660.0,407.0 709.5,429.0 | |
665.5,478.5 720.5,517.0 | |
+ | |
368.5,792.0 649.0,874.5 | |
357.5,308.0 440.0,357.5 | |
550.0,429.0 638.0,473.0 | |
+ | |
660.0,429.0 1012.0,511.5 | |
671.0,599.5 924.0,660.0 | |
836.0,368.5 962.5,418.0 | |
1188.0,368.5 1292.5,412.5 | |
+ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment