Skip to content

Instantly share code, notes, and snippets.

@xaedes
Created October 2, 2019 13:20
Show Gist options
  • Save xaedes/fcd541b3d7a5b56db47c7835e8950e42 to your computer and use it in GitHub Desktop.
Save xaedes/fcd541b3d7a5b56db47c7835e8950e42 to your computer and use it in GitHub Desktop.
simple script to convert folder of pdfs to text in bash using pdftotext
#!/bin/bash
CONVERT=pdftotext
OUTPUT_FOLDER="text"
CONVERT_FLAGS1=
CONVERT_FLAGS2=-layout
VARIANT_FOLDER1="${OUTPUT_FOLDER}/plain"
VARIANT_FOLDER2="${OUTPUT_FOLDER}/layout"
mkdir -p ${VARIANT_FOLDER1}
mkdir -p ${VARIANT_FOLDER2}
for fn in *.pdf
do
basename=$(echo "$fn" | sed -r "s/\.pdf$//")
echo "processing ${fn}..."
FN_OUT1="${VARIANT_FOLDER1}/${basename}.txt"
if [[ -f ${FN_OUT1} ]]; then
echo "exists"
else
echo $CONVERT $CONVERT_FLAGS1 "$fn" ${FN_OUT1}
$CONVERT $CONVERT_FLAGS1 "${fn}" "${FN_OUT1}"
fi
FN_OUT2="${VARIANT_FOLDER2}/${basename}.txt"
if [[ -f ${FN_OUT2} ]]; then
echo "exists"
else
echo $CONVERT $CONVERT_FLAGS2 "$fn" ${FN_OUT2}
$CONVERT $CONVERT_FLAGS2 "${fn}" "${FN_OUT2}"
fi
echo "done."
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment