Skip to content

Instantly share code, notes, and snippets.

@rjw57
Last active October 23, 2023 02:00
Show Gist options
  • Save rjw57/b9fbbd173d22aca42a80 to your computer and use it in GitHub Desktop.
Save rjw57/b9fbbd173d22aca42a80 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python3
#
# THIS SCRIPT REQUIRES PYTHON 3
#
# Install requirements via:
# pip3 install docopt pillow reportlab
#
# Dedicated to the public domain where possible.
# See: https://creativecommons.org/publicdomain/zero/1.0/
"""
Download a pocketmags magazines in PDF format from the HTML5 reader.
Usage:
pmdown.py (-h | --help)
pmdown.py [options] <pdf> <url>
Options:
-h, --help Print brief usage summary.
--dpi=DPI Set image resolution in dots per inch.
[default: 150]
<pdf> Save output to this file.
<url> A URL to one image from the magazine.
Notes:
PLEASE USE THIS SCRIPT RESPONSIBLY. THE MAGAZINE PUBLISHING INDUSTRY RELIES
HEAVILY ON INCOME FROM SALES WITH VERY SLIM PROFIT MARGINS.
URLs for pocketmag images can be found by using the HTML 5 reader and
right-clicking on a page and selecting "inspect element". Look for URLs of
the form:
http://magazines.magazineclonercdn.com/<uuid1>/<uuid2>/high/<num>.jpg
where <uuid{1,2}> are strings of letters and numbers with dashes separating
them and <num> is some 4-digit number.
"""
import itertools
import re
from contextlib import contextmanager
from urllib.error import HTTPError
from urllib.parse import urlparse, urlunparse
from urllib.request import urlopen
import docopt
from PIL import Image
from reportlab.pdfgen import canvas
from reportlab.lib.units import inch
# The pattern of the URL path for a magazine
URL_PATH_PATTERN = re.compile(r'(?P<prefix>^[a-f0-9\-/]*/high/)[0-9]{4}.jpg')
@contextmanager
def saving(thing):
"""Context manager which ensures save() is called on thing."""
try:
yield thing
finally:
thing.save()
def main():
opts = docopt.docopt(__doc__)
pdf_fn, url = (opts[k] for k in ('<pdf>', '<url>'))
url = urlparse(url)
dpi = float(opts['--dpi'])
m = URL_PATH_PATTERN.match(url.path)
if not m:
raise RuntimeError('URL path does not match expected pattern')
prefix = m.group('prefix')
c = canvas.Canvas(pdf_fn)
with saving(c):
for page_num in itertools.count(0):
page_url = list(url)
page_url[2] = '{}{:04d}.jpg'.format(prefix, page_num)
page_url = urlunparse(page_url)
print('Downloading page {} from {}...'.format(page_num, page_url))
try:
with urlopen(page_url) as f:
im = Image.open(f)
except HTTPError as e:
if e.code == 404:
print('No image found => stopping')
break
raise e
w, h = tuple(dim / dpi for dim in im.size)
print('Image is {:.2f}in x {:.2f}in at {} DPI'.format(w, h, dpi))
c.setPageSize((w*inch, h*inch))
c.drawInlineImage(im, 0, 0, w*inch, h*inch)
c.showPage()
if __name__ == '__main__':
main()
@chongjasmine
Copy link

No module named docopt.

How to solve this problem?

@chongjasmine
Copy link

I download docopt and the rest.
Not sure what to do next. I type pmdown.py under cmd, but nothing happened.
All it does is show Usage:
pmdown.py (-h | --help)
pmdown.py [options]

@rjw57
Copy link
Author

rjw57 commented Apr 4, 2019

Follow the usage guide.

@chongjasmine
Copy link

Where is the usage guide?

@Numbr6
Copy link

Numbr6 commented Aug 4, 2019

Has the URL format for the images changed? "http://magazines.magazineclonercdn.com///high/.jpg" When I inspect element for an image, it's similar, but no 4 digit number before .jpg for example. Using the URL found generates an error because the path does not match the expected pattern.

@Numbr6
Copy link

Numbr6 commented Aug 4, 2019

@Zuescho
Copy link

Zuescho commented Aug 22, 2019

I'm trying to find a url right now @Numbr6 could you point me in the right direction, i have been searching for a while but can only find the thumbnails.

@Eddy300
Copy link

Eddy300 commented May 13, 2020

Is this still working ? i tried but that URL thing didnt able to find can you explain a bit more about where exactly need to look for jpg image.

Thanks

@ear9mrn
Copy link

ear9mrn commented May 17, 2020

Great little script...!

I had to edit the regex for the url. Not sure if my magazine is different or if they have changed how they are constructed.

URL_PATH_PATTERN = re.compile(r'(?P<prefix>/mcmags/[a-f0-9\-/]*/mid/)[0-9]{4}.jpg')

Note the addition of /mcmags/

@Stumpytrain
Copy link

Stumpytrain commented Aug 23, 2020

I've added /mcmags/ as @ear9mrn mentioned above, but it's still not working for me.

I keep getting the following error:

/Applications/Python\ 3.8/pmdown.py testing.pdf https://mcdatastore.blob.core.windows.net/mcmags/3db0b440-0324-44c8-8200-027ab05a34cd/a40ae4de-81a4-46b5-a0c9-8f2205421129/extralow/0003.jpg
Traceback (most recent call last):
File "/Applications/Python 3.8/pmdown.py", line 101, in
main()
File "/Applications/Python 3.8/pmdown.py", line 73, in main
raise RuntimeError('URL path does not match expected pattern')
RuntimeError: URL path does not match expected pattern

Any ideas? Thanks!

@Stumpytrain
Copy link

Got it, I needed to manually change "extralow" to "mid" in the image URL. Superb, thanks!

@scar009
Copy link

scar009 commented Aug 25, 2020

i want script for magzter magazine download

@Inversil
Copy link

Inversil commented Jan 1, 2021

Got it, I needed to manually change "extralow" to "mid" in the image URL. Superb, thanks!

could you post your edited script? can't get it to accept my url even after following your steps. I get "expected string or bytes-like object" at line 66: opts = docopt.docopt(__doc__)

@Stumpytrain
Copy link

could you post your edited script? can't get it to accept my url even after following your steps. I get "expected string or bytes-like object" at line 66: opts = docopt.docopt(__doc__)

I'm not sure if you're struggling with the same problem I had. I didn't need to edit the script, I just had to edit the URL. If that didn't work for you then there could be something else amiss.

Can someone do a Zinio downloader, please?! :)

@grd787
Copy link

grd787 commented Feb 28, 2021

So i'm running the following code in terminal and it's not seeming to do anything:
python3 pmdown.py -h test.pdf https://mcdatastore.blob.core.windows.net/mcmags/a8123f62-3fab-4a47-9702-a2e521a8c829/4f8f60e2-c901-4ce6-af4d-21dc14e0e5d8/mid/0000.jpg
Anything I could have possibly missed. After pressing enter in terminal, i just get the instructions contained in """ | """. This is my first time with Python.

@AliasFakename
Copy link

so the default is currently "extralow" and we can change it to "mid" but does anyone know how to get the higher quality jpg?

I know there is a higher quality available but I tried "high", and "extrahigh" but it just gives an error page, anyone know the right directory name for the high quality images?

@bani6809
Copy link

bani6809 commented Jun 14, 2021

looks like the format has changed, it is no longer .jpg but .bin and looks like its not jpg files.

https://mcdatastore.blob.core.windows.net/mcmags/{ ... }/{ ... }/high/0018.bin

@grd787
Copy link

grd787 commented Jun 19, 2021

I'm getting this error:
Traceback (most recent call last): File "/Users/greg/Desktop/pmdown.py", line 60, in <module> main() File "/Users/greg/Desktop/pmdown.py", line 25, in main opts = docopt.docopt(__doc__) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/docopt.py", line 558, in docopt DocoptExit.usage = printable_usage(doc) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/docopt.py", line 466, in printable_usage usage_split = re.split(r'([Uu][Ss][Aa][Gg][Ee]:)', doc) File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/re.py", line 231, in split return _compile(pattern, flags).split(string, maxsplit) TypeError: expected string or bytes-like object

Any ideas?

@omarmuthana112
Copy link

how can i get the uuid1 and uuid2 for the magazine please.

@Stumpytrain
Copy link

so the default is currently "extralow" and we can change it to "mid" but does anyone know how to get the higher quality jpg?

I know there is a higher quality available but I tried "high", and "extrahigh" but it just gives an error page, anyone know the right directory name for the high quality images?

I've tried everything I can think of and I can't get a better quality than "mid." It's a shame, because when you download the allowed 2 pages via Pocketmags, the quality is far superior.

@Dizzy-gr
Copy link

so the default is currently "extralow" and we can change it to "mid" but does anyone know how to get the higher quality jpg?
I know there is a higher quality available but I tried "high", and "extrahigh" but it just gives an error page, anyone know the right directory name for the high quality images?

I've tried everything I can think of and I can't get a better quality than "mid." It's a shame, because when you download the allowed 2 pages via Pocketmags, the quality is far superior.

Perhaps the 2-page print is the solution🤔. My coding days were when BASIC was a new thing
and have progressed little since then but isn't it possible to write a code that reiteratively prints two pages at a time until all are done? Then we could combine those in one pdf pretty easily, I'd have thought.

@sebfischer83
Copy link

let numberOfPages = 71;

for (let index = 0; index < numberOfPages; index += 2) {
   document.getElementById('print_menu').click();
setTimeout(() => {
    let pages = document.querySelectorAll('[pagenum="' + (index + 1) + '"]');
    pages[0].click();
    if (index + 2 <= numberOfPages)
    {
        pages = document.querySelectorAll('[pagenum="' + (index + 2) + '"]');
        pages[0].click();
    }
    document.getElementById('printPages').click();
}, 500); 
    
}

@RichardJRL
Copy link

I've modified this script to enable downloading of magazines in "high" quality and have created an option to add a magazine title to the generated PDF's metadata. I've published my new version in a separate GitHub repo as Gists don't seem to support pull requests. You can find it here: https://github.com/RichardJRL/pocketmagstopdf

The original author, rjw57, is welcome to include my changes in his Gist here if he wishes

@RichardJRL
Copy link

I've now further modified the script to download the whole magazine at the same quality that the restricted 2-page print option on the website offers.

As before, I've published my modified version on my GitHub page: https://github.com/RichardJRL/pocketmagstopdf

@hovlakas
Copy link

hovlakas commented Apr 6, 2023

Python neophyte here. I was able to find the various IDs and to get the latest script running, but after finding the last good page of the mag, the script terminates with ERROR - Unable to download magazine: HTTP error code 405. Any guidance would be appreciated.

@hovlakas
Copy link

hovlakas commented Apr 6, 2023

Sorry, new to Github, too. This is in reference to pocketmagstopdf. If I need to post elsewhere, please let me know.

@hovlakas
Copy link

hovlakas commented Apr 6, 2023

Never mind. I'll post to the Issues of that repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment