Created
January 16, 2020 19:30
-
-
Save bendavis78/ed22a974c2b4534305eabb2522956359 to your computer and use it in GitHub Desktop.
Extracts images from PDF while preserving PNG transparency
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
usage() { | |
echo "Usage: $(basename $0) in.pdf dest"; | |
} | |
[[ -z "$1" ]] && usage && exit 1; | |
[[ -z "$2" ]] && usage && exit 1; | |
TMPDIR="$(mktemp -d)"; | |
DIR=$2; | |
mkdir "$TMPDIR/extracted"; | |
# Extract the images into tmpdir | |
pdfimages -all $1 "$TMPDIR/extracted/image" || exit 1; | |
# Rename images based on object id and whether or not they are a mask | |
pdfimages -list $1 | tail -n +3 | while read row; do | |
num=$(echo "$row" | awk '{print $2}'); | |
imgtype=$(echo "$row" | awk '{print $3}'); | |
imgenc=$(echo "$row" | awk '{print $9}'); | |
objectid=$(echo "$row" | awk '{print $11}'); | |
if [[ "$imgenc" == "jpeg" ]]; then | |
ext="jpg"; | |
else | |
ext="png"; | |
fi | |
src=$(printf "$TMPDIR/extracted/image-%03d.$ext" $num); | |
if [[ "$imgtype" == "smask" ]]; then | |
dest=$(printf "$TMPDIR/image-%03d-mask.$ext" $objectid); | |
else | |
dest=$(printf "$TMPDIR/image-%03d.$ext" $objectid); | |
fi | |
echo "$src -> $dest"; | |
mv "$src" "$dest" || exit 1; | |
done | |
# Merge the images that have a mask | |
pdfimages -list $1 | tail -n +3 | while read row; do | |
imgtype=$(echo "$row" | awk '{print $3}'); | |
objectid=$(echo "$row" | awk '{print $11}'); | |
if [[ "$imgtype" == "smask" ]]; then | |
img=$(printf "$TMPDIR/image-%03d.png" $objectid); | |
mask=$(printf "$TMPDIR/image-%03d-mask.png" $objectid); | |
echo "convert $img $mask"; | |
convert "$img" "$mask" -alpha off -compose copy-opacity -composite "$img" || exit 1; | |
fi | |
done | |
rm "$TMPDIR"/image-*-mask.png*; | |
mv $TMPDIR/* "$DIR/"; |
Thanks!
I found a PDF where the image and its mask are different file types. This script assumes they are the same.
Thanks for this script. It made a great starting point for a larger project I am tackling. I took your gist and rewrote it in Python to better handle image types and different composition modes.
https://gist.github.com/XBigTK13X/4796a0ca7f16e83438914384a57dc46b
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Usefull, worked fine for me for a PDF with all images having transparent background, thanks for having shared this script