Skip to content

Instantly share code, notes, and snippets.

@XayOn
Created July 12, 2013 08:10
Show Gist options
  • Save XayOn/5982728 to your computer and use it in GitHub Desktop.
Save XayOn/5982728 to your computer and use it in GitHub Desktop.
parallel processing PDF files using a specified number of paralellized processes.
tmp=`mktemp`;
[[ $2 ]] && {
while read filename; do pdf2txt "${filename}" -o "${filename}.txt"; done <"$2"
} || {
find . -name "*pdf" -print > $tmp;
nlineas=$(( $(wc -l < $tmp) / $1 ))
for i in $(seq 0 $nlineas); do
[[ $i == $nlineas ]] && {
tail -n+$(( $i * $1 )) $tmp > ${tmp}_${i}
} || {
tail -n+$(($i * $nlineas)) ${tmp}|head -n$((($i*$nlineas) + $nlineas)) > ${tmp}_${i}
}
echo ${tmp}_${i} | parallel --gnu "bash $0 $1 {}" &
done
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment