-
-
Save oxplot/4b5ded9875cbba3e514f to your computer and use it in GitHub Desktop.
#!/usr/bin/env python | |
# svglinkify.py - Add hyperlinks to PDFs generated by Inkscape | |
# Copyright (C) 2015 Mansour Behabadi <[email protected]> | |
# | |
# This script comes with no warranty whatsoever. Use at your own risk. | |
# If you decide to distribute verbatim or modified versions of this | |
# code, you must retain this copyright notice. | |
# | |
# Usage: svglinkify.py <svg-file> <inkscape-gen-pdf> <linkified-pdf> | |
# Requires: | |
# qpdf | |
# inkscape | |
# python 2/3 | |
# | |
# WARNING Since this script is one heck of a hack, you should follow the | |
# instructions below to the letter, or you will fail miserably. | |
# | |
# 1. Start by making an SVG that looks nice and everything and add a | |
# piece of text somewhere. | |
# | |
# 2. Select the rectangle tool and draw a box on top of the text. | |
# This box will be the clickable area of our link. Set its fill color | |
# to #ff00ff (magenta) and remove any strokes. | |
# | |
# 3. Right click the box and select "Create Link". In the "Object | |
# attributes" window that opens up, type the destination link in | |
# "Href". | |
# | |
# 4. Send the box to the back (using End key on the keyboard) so you can | |
# see your text. DO NOT move your box at any time after you've | |
# created the link. More details below. | |
# | |
# 5. Export your SVG as PDF and run svglinkify.py: | |
# | |
# $ svglinkify.py my_doc.svg my_doc.pdf my_doc_with_links.pdf | |
# | |
# So you pass your SVG file as the first arg, the exported PDF as the | |
# 2nd arg and the name of final PDF as 3rd arg. | |
# | |
# 6. If you did everything right, open my_doc_with_links.pdf and you | |
# should be able to click your text and open the link in browser. You | |
# also notice that the magenta box is gone. That's it. Now read the | |
# sections below if you hate being frustrated when things break. | |
# | |
# HOW IT WORKS | |
# | |
# The script looks for magenta boxes (surprise!) that have a link. It | |
# then extracts their x,y position and hyperlinks. It does the same | |
# search for magenta boxes in the generated PDF and tries to match them | |
# up by their relative locations. Therefore it's crucial to get the | |
# locations right. Since SVG is pretty damn flexible, locations aren't | |
# always simple x,y attributes. When you create a link for an object, | |
# you wrap it in a group. Groups don't have x,y, instead they are | |
# transformed using 2D matrices which means, maths calculations must be | |
# done in order to find out where the enclosed box really is. This | |
# script is too dumb to do that. That's why you should not move a box | |
# after you create a link for it. | |
# | |
# You could either delete it and draw a new one, or if you like it | |
# dangerous, you can enter the group (ie double cliking the box) and | |
# then move the box. This way, you're not moving the group so no | |
# transformations will be applied. You're bound to make a mistake sooner | |
# or later this way, so don't do it. | |
# | |
# If you can't get this to work after at least several attempts, email | |
# me your SVG and the PDF inkscape generated for you and I should be | |
# able to help. | |
from __future__ import unicode_literals | |
from __future__ import print_function | |
from itertools import count | |
from subprocess import call, PIPE, Popen | |
import os | |
import re | |
import sys | |
import tempfile | |
# Magic to support python both 2 and 3 | |
try: | |
range = xrange | |
except: | |
pass | |
try: | |
import HTMLParser as html_parser | |
except: | |
import html.parser as html_parser | |
_html_parser = html_parser.HTMLParser() | |
try: | |
html_unescape = _html_parser.unescape | |
except: | |
import html | |
html_unescape = html.unescape | |
# Command line parsing | |
if len(sys.argv) < 4: | |
print('Usage: %s <svg-file> <inkscape-gen-pdf> <linkified-pdf>' | |
% sys.argv[0], file=sys.stderr) | |
exit(1) | |
svg_path = sys.argv[1] | |
pdf_in_path = sys.argv[2] | |
pdf_out_path = sys.argv[3] | |
# Load the link rects from SVG file | |
SVG_X_PAT = re.compile(r'\bx="([^"]+)"') | |
SVG_Y_PAT = re.compile(r'\by="([^"]+)"') | |
with open(svg_path, 'r') as svg_file: | |
svg_rects = [( | |
html_unescape(i[0]), | |
float(SVG_X_PAT.search(i[1]).group(1)), | |
float(SVG_Y_PAT.search(i[1]).group(1)) | |
) for i in re.findall(r''' | |
<a[^>]*?\bxlink:href="([^"]+)"[^>]*>\s*<rect | |
([^>]*?\bstyle="[^"]*?\bfill:[#]ff00ff\b[^>]*) | |
''', svg_file.read(), re.X)] | |
# QDFy the input PDF & load the resulting PDF to memory | |
fd, qdf_tmppath = tempfile.mkstemp() | |
os.close(fd) | |
try: | |
if call(['qpdf', '--qdf', pdf_in_path, qdf_tmppath]) != 0: | |
print('error: qpdf failed', file=sys.stderr) | |
exit(1) | |
with open(qdf_tmppath, 'rb') as ps_file: | |
pdf_data = ps_file.read() | |
finally: | |
try: | |
os.unlink(qdf_tmppath) | |
except: | |
pass | |
# Load the rects and last object ID from PDF file | |
PDF_RECT_PAT = re.compile(br''' | |
\b1\s+0\s+1\s+rg(?:\s+/a0\s+gs)? | |
((?:\s+[\d.-]+\s+[\d.-]+\s+[\d.-]+\s+[\d.-]+\s+re\s+f)+)\b | |
''', re.X) | |
m = PDF_RECT_PAT.search(pdf_data) | |
pdf_rects = re.split(br'\s+', m.group(1).strip()) if m else [] | |
pdf_rects = [ | |
list(map(float, pdf_rects[i:i + 4])) | |
for i in range(0, len(pdf_rects), 6) | |
] | |
last_obj = re.search(br'\bxref\s+(\d+)\s+(\d+)\b', pdf_data) | |
if not last_obj: | |
print('error: could not find last obj id', file=sys.stderr) | |
exit(1) | |
last_obj = tuple(map(int, last_obj.groups())) | |
# Some sanity check to ensure our matches are good | |
if len(svg_rects) != len(pdf_rects): | |
print(''' | |
error: found diff # of rects in svg & ps | |
This can be due to number of reasons: | |
- you've moved the box after creating a link for it - bad move! | |
fix: delete it and draw a new box and DON'T MOVE it this time | |
- you've grouped the boxes and done some fancy things | |
fix: see above | |
- you forgot to remove the strokes from the boxes | |
- you have removed a box but Inkscape is still keeping it in the file | |
fix: do a document cleanup or close/re-open your file | |
'''.strip(), file=sys.stderr) | |
exit(1) | |
# Match up the rects based on their relative X,Y position | |
# FIXME there is a possibility that due to rounding errors, links get | |
# matched up incorrectly. Always check the final PDF before sharing. | |
svg_rects.sort(key=(lambda x: int(x[2] * 100)), reverse=True) | |
svg_rects.sort(key=lambda x: int(x[1] * 100)) | |
pdf_rects.sort(key=lambda x: (int(x[0] * 100), int(x[1] * 100))) | |
# Generate the PDF hyperlink objects | |
pdf_link_tpl = ''' | |
%%QDF: ignore_newline | |
%d %d obj | |
<< | |
/A << /S /URI /URI (%s) >> | |
/Border [ 0 0 0 ] | |
/Rect [ %f %f %f %f ] | |
/Subtype /Link | |
/Type /Annot | |
>> | |
endobj | |
'''.strip() | |
pdf_links = '\n'.join(pdf_link_tpl % ( | |
c, last_obj[0], s[0], p[0], p[1], p[0] + p[2], p[1] + p[3] | |
) for p, s, c in zip(pdf_rects, svg_rects, count(last_obj[1]))) | |
# Remove the visual rects from PDF, write out the new hyperlink objs | |
pdf_data = PDF_RECT_PAT.sub(b'', pdf_data) | |
pdf_data = re.sub( | |
(r'\bxref\s+%d\s+%d\b' % last_obj).encode('ascii'), | |
(pdf_links + '\nxref\n%d %d' % ( | |
last_obj[0], last_obj[1] + len(svg_rects) | |
)).encode('ascii'), | |
pdf_data | |
) | |
pdf_data = re.sub( | |
br'([%][%]\s+Page\s+1\s+[%][%][^\n]+\s+\d+\s+\d+\s+obj\s+<<)', | |
(r'\1/Annots [%s] ' % ' '.join( | |
'%d %d R' % (i + last_obj[1], last_obj[0]) | |
for i in range(len(svg_rects)) | |
)).encode('ascii'), pdf_data) | |
# Optimize and save the new file | |
fd, out_tmppath = tempfile.mkstemp() | |
os.close(fd) | |
try: | |
with open(out_tmppath, 'wb') as out_tmpfile: | |
fix_qdf_proc = Popen(['fix-qdf'], stdin=PIPE, stdout=out_tmpfile) | |
fix_qdf_proc.communicate(pdf_data) | |
if fix_qdf_proc.wait() != 0: | |
print('error: failed writing the mod pdf', file=sys.stderr) | |
exit(1) | |
if call([ | |
'qpdf', '--object-streams=generate', '--stream-data=compress', | |
out_tmppath, pdf_out_path | |
]) != 0: | |
print('error: failed writing the mod pdf', file=sys.stderr) | |
exit(1) | |
finally: | |
try: | |
os.unlink(out_tmppath) | |
except: | |
pass |
It works, thanks !
Hi there!
First of all, thanks for your labor.
I cant run the code, may you can help me. I follow the spets mentioned and download the qpdf tool.
Maybe my problem is in the qpdf part. I am working with windows, and I placed the qpdf files in C:.
Well, when I try to run the script I have the following error:
Hi! For me, it says that svglinkify.py could not open cause its damaged or not supported. How do I fix that?
I wish this was still supported, the Go version is archived and Inkscape to this day can't do internal links like this script could :(
Just tested it on Inkscape 1.2 and it works fine.
Screencast from 2022-11-24 00-13-55.webm
Wow, did not think I would see a response to something this old this fast, if ever! Sadly, you misunderstood my comment: it's internal links that don't work (such as #page=2
), external links are fine. There is an open issue for that in Inkscape, but with no activity and honestly not a lot of interest, from what I can tell.
Being totally incapable of any C development, I can't help there, so I'm currently attempting to re-invent what this script must have done using a Java PDF library, as I was unable to understand the Go code, and unable to find enough information on the raw PDF syntax.
The Inkscape isse: https://gitlab.com/inkscape/inbox/-/issues/7486
Addon: the Go code throws this error when attempting to use it to convert my SVG to PDF:
inkscape didn't tell us the bounding box for link '%s' - ignoring link#page=2
inkscape errored while generating PDF
@Gaibhne I assume you've attempted to run the svglinkify script (from the history) and that didn't work?
No, until you linked it, I was unable to find the old script; the gist is now a 404 and the bit in here was changed to just the links. The Go project doesn't really have any instructions other than how to call it. Now that you've linked a version of the script, I read up on the instructions, but was unable to get it to work either; it returns with error: could not find last obj id
.
I am exporting with the 'save copy as' functionality in Inkscape, as I don't know how else I would go about producing a multi page PDF. Is that correct ?
I can't actually see how the old script would cater for this internal linking scenario — there's no provision for it in the script. What's an example of an internally linked SVG you've used in the past that correctly jumped around the document when exported to PDF and viewed in a PDF viewer? Once we have that, we can try to make it work.
I have never managed to get that working, that is how I found my way here. I did not realize the Go version had that capability as a 'new' functionality over the old script, I guess all my searching was for vain, then :( I've tried the Go version with the instructions from the old script (no moving, #FF00FF color, etc) to no avail either. Same error as above - both for internal and regular links.
I'm confused.
I wish this was still supported, the Go version is archived and Inkscape to this day can't do internal links like this script could :(
When you mention "like this script could", I read that as "you tried it with svglinkify python script in the past (this gist) and you managed to create a PDF file that had working internal links". If so, then can you send me an SVG file with internal links and I'll see if I can make it work.
OK, I finally understand what you're referring to: the Go version of svglinkify which did explicitly support internal links:
https://github.com/oxplot/svglinkify/blob/470287cdb1b0bf2dd5632f31405d8c49b507edf6/main.go#L91
Well the Go version outputted corrupt PDFs half the time, so not really useful. It's not that hard to do a python version that uses QPDF without corrupting stuff and also supports multi page SVGs (as supported by newer Inkscape).
No, I'm sorry, I'm the one that was confused - I saw the Go programs description, suggesting that it could handle internal links, and I thought the Go program to just be a rewrite of the Python script, so I assumed that was a feature of both. But with me understanding at least a little bit of Python, I wished for this script back because then I could have had a chance to reverse engineer how it was supposed to work and maybe figure out why it didn't work. In the end it seems like the Go program would have worked for me maybe with an older Inkscape version; going purely from its readme, but I have never managed to get either Go or Python version working, and only came across this whole thing today so I don't know if it would have covered my use case - a multi page PDF, with links in the PDF taking me to different pages.
What inspired me to want that was that my e-ink pad supports PDF with internal links, and you can make amazing things like what you see in the video in slide 2 here https://www.etsy.com/listing/1054473758/remarkable-2-daily-planner-standard - I wanted to make something like that for myself.
Gotcha — I'll give it a shot. Didn't realize that Inkscape had this feature missing.
That would be super awesome, but don't go to too much trouble on my account; I'm just a hobbyist trying to make a fancy personal journal, and trying to do it the most ideal way possible :D
For what it's worth, from what I've googled today it seems that the problem is just that Inkscape prepends internal links with the file:///...
stuff, and the solution might be as simple as changing the link targets in the resulting PDF, which is why I was thinking of grabbing a PDF library in a language I am fluent in and looking at opening PDFs, changing the link targets and writing them back.
@Gaibhne Re-wrote the script in my day off and tested it with a bunch of samples. Should be OK. Find it at https://github.com/oxplot/svglinkify
That is insane, thank you so much! For me, it fails at the most basic part though - the resulting PDF has only a single page, no matter what I do. Even "New File, New Page, Save, svglinkify.py" results in only a single resulting page. I tried to exactly duplicate what you do, but nothing I did made it produce more than a single page somehow.
Addendum: links are added correctly, but internal links to other pages don't work, because the objects presumably get optimized away due to not being on any page. Internal links to same-page elements produce a clickable link which work correctly (zooming in and clicking scrolls you to the target element). External links also work.
Debugging shows that after # Load object IDs of all pages in the PDF document.
, only a single ID is found (earlier, under # Get pages and convert all measurements to pixels.
all pages are correctly detected). The QDF output also shows only a single /Page
, which I do not understand, especially not why it would work for you and not me. Inkscape Save Copy As
correctly produces a multi-page-pdf. I'd be happy to supply files generated by me to test with but I'm not sure how to attach them here.
I'd be happy to supply files generated by me to test with but I'm not sure how to attach them here.
Yep, please do. You can upload your file here for e.g.: https://www.file.io/
Meanwhile, make sure you have your version of qpdf
is not too far off of 11.1.1
. Similarly, my Inkscape is at version Inkscape 1.2.1
(there seems to be a bunch of fixes for PDF export in 1.2.1 release).
Oh God, I hope you didn't waste too much time on this yet. I just tested, and I did in fact have a broken Inkscape version because my upstream hadn't added the newest one yet. I got it working (although I have a file that I can't get working even after deleting every single thing in it, which I'll attach in case it's something obvious). But following your demo works perfectly for me now, so thanks!
Broken file: https://file.io/TF3IUwiLb3H1 - it's just three blank pages now, I've removed everything, but something makes the script error out with:
Traceback (most recent call last):
File "/home/stoever/svglinkify-new/svglinkify.py", line 340, in <module>
main()
File "/home/stoever/svglinkify-new/svglinkify.py", line 70, in main
svg_width_pixels = unit_2_px[svg_width[-2:]] * float(svg_width[:-2])
KeyError: '04'
Thanks for that. Was handling the px
unit for the document size incorrectly. Just pushed a commit which works with your example file.
It did, thank you again! Do you have one of those 'buy a developer a beer' kind of things ?
It did, thank you again! Do you have one of those 'buy a developer a beer' kind of things ?
Great. You're welcome - it was fun. I added some sponsor links to the svglinkify project page. :)
I tried them all, and they are all too unconfigured to do anything :D Ko-Fi for example says "Oops, that didn't work! This creator cannot receive PayPal payments at this time.", and the others are all similar messages.
@Gaibhne whoops I should probably set them up 😄
EDIT: OMG, the payment part is a PITA — I guess I'm content with gratitude alone. Thanks anyway.
Wonderful! Thanks!
Just a suggestion on the instructions: after adding the box with the link to the SVG file, it is not enough to export it as PDF. One has also to save the SVG. Could you please explicitly add this instruction to the top of the script, please?