Skip to content

Instantly share code, notes, and snippets.

@bendavis78
Created January 16, 2020 19:30
Show Gist options
  • Save bendavis78/ed22a974c2b4534305eabb2522956359 to your computer and use it in GitHub Desktop.
Save bendavis78/ed22a974c2b4534305eabb2522956359 to your computer and use it in GitHub Desktop.
Extracts images from PDF while preserving PNG transparency
#!/bin/bash
usage() {
echo "Usage: $(basename $0) in.pdf dest";
}
[[ -z "$1" ]] && usage && exit 1;
[[ -z "$2" ]] && usage && exit 1;
TMPDIR="$(mktemp -d)";
DIR=$2;
mkdir "$TMPDIR/extracted";
# Extract the images into tmpdir
pdfimages -all $1 "$TMPDIR/extracted/image" || exit 1;
# Rename images based on object id and whether or not they are a mask
pdfimages -list $1 | tail -n +3 | while read row; do
num=$(echo "$row" | awk '{print $2}');
imgtype=$(echo "$row" | awk '{print $3}');
imgenc=$(echo "$row" | awk '{print $9}');
objectid=$(echo "$row" | awk '{print $11}');
if [[ "$imgenc" == "jpeg" ]]; then
ext="jpg";
else
ext="png";
fi
src=$(printf "$TMPDIR/extracted/image-%03d.$ext" $num);
if [[ "$imgtype" == "smask" ]]; then
dest=$(printf "$TMPDIR/image-%03d-mask.$ext" $objectid);
else
dest=$(printf "$TMPDIR/image-%03d.$ext" $objectid);
fi
echo "$src -> $dest";
mv "$src" "$dest" || exit 1;
done
# Merge the images that have a mask
pdfimages -list $1 | tail -n +3 | while read row; do
imgtype=$(echo "$row" | awk '{print $3}');
objectid=$(echo "$row" | awk '{print $11}');
if [[ "$imgtype" == "smask" ]]; then
img=$(printf "$TMPDIR/image-%03d.png" $objectid);
mask=$(printf "$TMPDIR/image-%03d-mask.png" $objectid);
echo "convert $img $mask";
convert "$img" "$mask" -alpha off -compose copy-opacity -composite "$img" || exit 1;
fi
done
rm "$TMPDIR"/image-*-mask.png*;
mv $TMPDIR/* "$DIR/";
@SR-G
Copy link

SR-G commented Jan 10, 2023

Usefull, worked fine for me for a PDF with all images having transparent background, thanks for having shared this script

@dragoon
Copy link

dragoon commented Aug 12, 2023

Thanks!

Copy link

ghost commented Oct 14, 2023

I found a PDF where the image and its mask are different file types. This script assumes they are the same.

@XBigTK13X
Copy link

Thanks for this script. It made a great starting point for a larger project I am tackling. I took your gist and rewrote it in Python to better handle image types and different composition modes.

https://gist.github.com/XBigTK13X/4796a0ca7f16e83438914384a57dc46b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment