-
-
Save wvengen/27162f92acadfaf3ac6b782b9a018285 to your computer and use it in GitHub Desktop.
#!/bin/sh | |
# | |
# Generates PDF from Publitas images (online folder service) | |
# Stores generated PDF and JSON (which may contains links). | |
# | |
# Requirements: | |
# - wget https://www.gnu.org/software/wget/ | |
# - jq https://stedolan.github.io/jq/ | |
# - imagemagick https://www.imagemagick.org/ | |
# | |
# You may need to remove the PDF-related security policy for ImageMagick for this to work. | |
# | |
if [ ! "$2" ]; then | |
echo "Usage: $0 <publitas_folder_url> <output_name>" | |
exit 1 | |
fi | |
URL="$1" | |
OUT="$2" | |
DIR=`mktemp -d --suffix=.getpublitas` | |
wget -q -O /dev/stdout "$URL" | sed 's/^\s*var\s\+data\s\+=\s\+\(.*\);\s*$/\1/p;d' > "$DIR/$NAME.json" | |
cat "$DIR/$NAME.json" | jq -r '.spreads[].pages[].images | .at2400 // .at2000 // .at1600 // .at1200 // .at1000' >"$DIR/img_urls" | |
i=1 | |
for u in `cat "$DIR/img_urls"`; do | |
echo "$u" >"$DIR/cur_url" # use file to be able to use base | |
wget -q --base="$URL" -O `printf "$DIR/image-page-%04d.jpg" $i` -i "$DIR/cur_url" | |
i=$(( $i + 1 )) | |
done | |
convert "$DIR/image-page-*.jpg" "$OUT.pdf" | |
cp "$DIR/$NAME.json" "$OUT.json" | |
rm -Rf "$DIR" |
Thanks for the script. Still working in 2023.
I would suggest showing that the program is downloading the images since it doesn't output anything and you might think it is not doing anything.
Hello, I'm trying to download a Publitas page and make a pdf out of it, using your script. I'm new to .sh files but have managed to install all the dependencies, however I'm getting this error, when running:
bash get-publitas.sh https://view.publitas.com/malmberg/589122_bvj_4vwo_lob_a_bladerboek BVJ-4VA
. I've tried modifying the ImageMagick policy.xml file to this . Also in your script I changed line 34 toconvert -limit memory 8GiB -limit disk 8GiB -limit area 8GiB "$DIR/image-page-*.jpg" "$OUT.pdf"
Any idea how to solve this?
Did you eventually manage to solve this? I am also trying to download the biology books :).
Hi @Zerovelocity275 & @luduma,
I would like to point out that this script is not necessary anymore, since you can just add /unsupported
to the url and download the pdf from Publitas themselves.
Hi @Zerovelocity275 & @luduma, I would like to point out that this script is not necessary anymore, since you can just add
/unsupported
to the url and download the pdf from Publitas themselves.
Oh, thank you so much, that's great.
you can just add
/unsupported
to the url and download the pdf from Publitas themselves.
Hi @GlowingBulb , I'm doing that and it just says: Whoops! Something went wrong... We're sorry, but this part is no longer available., so they patched it right? idk if I'm doing it right, im adding it at the end of url
Hi @Cristark02, As far as I know it still works. Make sure that you add the /unsupported
to the end of the "root" url like this:
https://view.publitas.com/four-hands/fourhands_fall23/page/1
↓
https://view.publitas.com/four-hands/fourhands_fall23/unsupported
Hi @Zerovelocity275 & @luduma, I would like to point out that this script is not necessary anymore, since you can just add
/unsupported
to the url and download the pdf from Publitas themselves.
Great, still works! Thank you!
Hi @Cristark02, As far as I know it still works. Make sure that you add the
/unsupported
to the end of the "root" url like this:https://view.publitas.com/four-hands/fourhands_fall23/page/1
↓https://view.publitas.com/four-hands/fourhands_fall23/unsupported
I logged in just to say I'd kiss you if I had you in front of me.
Thanks a lot
Hello, I'm trying to download a Publitas page and make a pdf out of it, using your script. I'm new to .sh files but have managed to install all the dependencies, however I'm getting this error, when running:
bash get-publitas.sh https://view.publitas.com/malmberg/589122_bvj_4vwo_lob_a_bladerboek BVJ-4VA
.I've tried modifying the ImageMagick policy.xml file to this . Also in your script I changed line 34 to
convert -limit memory 8GiB -limit disk 8GiB -limit area 8GiB "$DIR/image-page-*.jpg" "$OUT.pdf"
Any idea how to solve this?