Last active
May 25, 2022 15:42
-
-
Save griloHBG/c1ed4880866adca82c0c19d3e097c31d to your computer and use it in GitHub Desktop.
Convert all PDF files in a directory (and all directories within) to text
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# making sure that names with spaces won't be a problem | |
IFS=$'\n'; | |
# head to remove the last 2 lines (one blank and one summary) and tail to remove the first line (always a dot) of tree output | |
# TODO: what about *.PDF? | |
for i in $(tree . -f -P "*.pdf" -i | head -n -2 | tail -n +2); | |
# echo the relative file path | |
do echo -n "${i}: "; | |
# if it is a file (and not a directory), perform the conversion | |
[ -f ${i} ] && echo PDF && pdftotext "${i}" "${i}.txt"; | |
done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment