Skip to content

Instantly share code, notes, and snippets.

@BenniG123
Last active November 14, 2023 14:37
Show Gist options
  • Save BenniG123/42b645e82b9f850d60735bbffbae339f to your computer and use it in GitHub Desktop.
Save BenniG123/42b645e82b9f850d60735bbffbae339f to your computer and use it in GitHub Desktop.
Quick powershell script to grab all images in a folder, run tessaract OCR on all of them, and then pipe all output to out.txt
function Get-DirectoryName($initialDirectory)
{
[System.Reflection.Assembly]::LoadWithPartialName("System.Windows.Forms") | Out-Null
$openFolderDialog = New-Object System.Windows.Forms.FolderBrowserDialog
$openFolderDialog.ShowDialog() | Out-Null
return $openFolderDialog.SelectedPath
}
Write-Output "Select image folder"
$folderPath = Get-DirectoryName -initialDirectory "~"
Write-Output "Selected $folderPath"
$files = Get-ChildItem "$folderPath\*" -Include *.png,*.jpg,*.bmp
$outFilePath = Join-Path $folderPath "out.txt"
foreach ($file in $files)
{
$imagePath = Join-Path $folderPath $file.Name
$txtFileName = $file.BaseName
$txtPath = Join-Path $folderPath $txtFileName
Write-Output "$imagePath $txtPath"
# Actually do OCR - You have to add tesseract to your PATH, or specify its full path here
tesseract $imagePath $txtPath
# tesseract adds .txt to out files
$txtPath = $txtPath + ".txt"
# Write to single out file
$txtPath | Out-File $txtPath -Append -Encoding ascii
Get-Content $txtPath | Out-File $outFilePath -Append
Remove-Item $txtPath
}
@MasterKia
Copy link

Thanks, I appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment