Last active
October 14, 2022 06:18
-
-
Save welblaud/869b9ddc62cd3c8a55cdd4b0876ea43a to your computer and use it in GitHub Desktop.
A module for preparing TEI Simple XML files stored in eXist-db for latter usage in InDesign
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
xquery version "3.0"; | |
module namespace dtp-utils = 'http://46.28.111.241:8081/exist/db/apps/karolinum-x/modules/dtp-utils'; | |
import module namespace cust-utils = 'http://46.28.111.241:8081/exist/db/apps/karolinum-x/modules/cust-utils' at 'cust-utils.xqm'; | |
declare namespace tei = 'http://www.tei-c.org/ns/1.0'; | |
(:~ This module is useful for in-memory converting of TEI Simpe XML into XML suitable | |
: for importing into InDesign. It is not a silver bullet, it was tested and developed | |
: for a very specific scenario. However, it should be useful for anyone who uses TEI | |
: Simple XML (possibly with minor modifications of replacing strings and so on) and | |
: wants typeset the data in InDesign. It treats a couple of well known obstacles, | |
: which prevents many people from importing XML into InDesign. | |
: | |
: The basic idea is: | |
: 1) Grab the file/book; | |
: 2) Make some minor textual changes, which are not useful for files stored | |
: in the DB (here specifically for Czech prepositions). | |
: 3) For graphics, replace @url with @href attribute, add file:///img for | |
: relative paths. It depends on whre you want to store images in | |
: the archive/folder. | |
: 4) For every title (head element), add a number of its level in the | |
: document hierarchy, it is later much easier to style headings automatically. | |
: 5) It is not easy to import notes as footnotes in InDesign; the workaround | |
: is to wrap every note into a chosen pattern, which is later easily | |
: recognized by a special script I provide in a form of Gist too (url below). | |
: Because it is better to wrap everything into an element, I wrap notes | |
: into <footnoteForInDesign/>. And because of every note includes one or | |
: more paragraphs, which I need to somehow differ from the rest of paragraphs | |
: outside the note (for automatic styling), I rename these to pnote. | |
: 6) InDesign also does not support tables as they are in TEI XML. However, | |
: it supports CALS standard for tables. Hence I convert every table into | |
: this standard. | |
: 7) I separated all the mentioned functionality into a bunch of functions. | |
: It seems more modular! | |
: 8) At the end, I pack the file aside with all images. If there is some | |
: image missing in the DB or the path is in some way wrong, it | |
: adds a text file as a log about the problem istead of the image. | |
: 9) In InDesign (tested on version CS6): | |
: i) create new file; | |
: ii) import the XML; | |
: iii) in the import dialog, check mainly the last option about | |
: importing CALS tables (if you need that), the rest is up to you; | |
: iv) do not be in panic, the imported file will certainly include a lot | |
: of annoying whitespace. It is handy to remove it with GREP– | |
: the regex working for me is (\r\s{2,})|(\s{2,}\r) and replace | |
: with \r, other ways are much more complicated; | |
: v) if you did not prepare the styling and tag–style associations | |
: earlier, do it now—map TAGS to STYLES (I don’t have much experience | |
: with the opposite action of mapping styles to tags); | |
: vi) now you will see that your notes are still present in the text—feel | |
: free to use the script provided on the link below (Windows–Helpers–Scripts, | |
: User folder, right click and Open it in Explorer or Finder, put the script.js | |
: into it, close the Finder/Explorer window, run the script from the window/panel, | |
: where is should be present), it will convert all notes wrapped into @foot_beg@ | |
: and @foot_end@ in real footnotes; | |
: vii) if there is a linebreak between the index numbers and bodies of | |
: footnotes, remove it with GREP (\t\r replace with \t or \s, as desired) | |
: viii) if you have problems with some pictures (they are “hidden” and overflow | |
: at the end of the document or some of its parts, it is because there is | |
: missing the metadata about their DPI value), repair them in Photoshop | |
: or Gimp or so (simply add the DPI value), it is good idea to repair | |
: them before the import, I don’t have any experiences with doing that later. ~:) | |
(: Prepare Footnotes – takes every instance of tei:note and wraps it into | |
: <footnoteForInDesign>@foot_beg@ … @foot_end@</footnoteForInDesign>, | |
: it is needed for a special script in InDesign, which moves notes places | |
: originally in text into the area for footnotes, below the page. | |
: The script: https://gist.github.com/welblaud/c21a96f2f23db58b4011726cf21addb8 | |
: It is also very handy to rename the paragraphs to pnote or other custom | |
: name which differentiates them from the paragraphs outisde the note, | |
: styling and style assigning is much easier with this. :) | |
declare function dtp-utils:prepare-footnotes($document as node()*) as item()* { | |
for $node in $document | |
return | |
typeswitch ($node) | |
(: returns the whole document-node :) | |
case document-node() return | |
dtp-utils:prepare-footnotes($node/node()) | |
case element() return | |
(: if the element is note, wrap it into footnoteForInDesign | |
element and @foot_beg@/@foot_end@ strings :) | |
if (xs:string(name($node)) eq 'note') then | |
element { 'footnoteForInDesign' } { | |
'@foot_beg@', | |
element { name($node) } { | |
$node/@*, | |
dtp-utils:prepare-footnotes($node/node()) | |
}, | |
'@foot_end@' | |
} | |
(: if the element is a paragraph inside the note [simple:footnote:text], | |
rename it to pnote :) | |
else if ($node/@rendition eq 'simple:footnote:text') then | |
element { 'pnote' } { | |
$node/@*, | |
dtp-utils:prepare-footnotes($node/node()) | |
} | |
(: the rest of elements are passed through :) | |
else | |
element { node-name($node) } { | |
$node/@*, | |
dtp-utils:prepare-footnotes($node/node()) | |
} | |
(: text nodes are passed through too :) | |
case text() return | |
$node | |
(: the rest are ommited – processing instructions, comments :) | |
default return | |
() | |
}; | |
(: Prepare Heads takes every head (except those in figures) and renames | |
: it according to its level. It is useful for applying/mapping styles in InDesign. :) | |
declare function dtp-utils:prepare-heads($document as node()*) as item()* { | |
for $node in $document | |
return | |
typeswitch ($node) | |
(: return the whole document-node :) | |
case document-node() return | |
dtp-utils:prepare-heads($node/node()) | |
(: return the node but rename it according to the level, omit heads in figures and tables :) | |
case element() return | |
if (xs:string(name($node)) eq 'head' and not($node/parent::tei:figure) and not($node/parent::tei:table)) then | |
element { name($node) || count($node/ancestor::tei:div) } { | |
dtp-utils:prepare-heads($node/node()) | |
} | |
else | |
element { node-name($node) } { | |
$node/@*, | |
dtp-utils:prepare-heads($node/node()) | |
} | |
case text() return | |
$node | |
default return | |
() | |
}; | |
(: Prepare Images takes every graphic and replaces the name of its @url | |
: with @href, which is preferred in InDesign. It also replaces the | |
: contents of the graphic with the url. This solution is less | |
: error prone. Typesetter only sees the url and places | |
: the image manually. :) | |
declare function dtp-utils:prepare-images($document as node()*) as item()* { | |
for $node in $document | |
return | |
typeswitch ($node) | |
(: returns the whole document-node :) | |
case document-node() return | |
dtp-utils:prepare-images($node/node()) | |
case element() return | |
(: if the element is graphic, replaces the url attribute :) | |
if (xs:string(name($node)) eq 'graphic') then | |
element { name($node) } { | |
attribute { 'href' } { 'file:///img/' || $node/@url }, | |
dtp-utils:prepare-images($node/node()) | |
} | |
else | |
element { node-name($node) } { | |
$node/@*, | |
dtp-utils:prepare-images($node/node()) | |
} | |
case text() return | |
$node | |
default return | |
() | |
}; | |
(: Prepare Tables transforms any table into CALS standard, which is supported | |
: by InDesign natively. Tables are then imported automatically as tables! | |
: Because of the possibility the function could be used at some point where | |
: the elements have lost their assotiation with the default namespace, it | |
: is useful to iterate over both, table elements IN and OUT of the | |
: TEI namespace. :) | |
declare function dtp-utils:prepare-tables($document as node()*) as item()* { | |
for $node in $document | |
return | |
typeswitch ($node) | |
case document-node() return | |
(: returns the document-node :) | |
dtp-utils:prepare-tables($node/node()) | |
case element() return | |
(: if the node is table, returns the element table :) | |
if (xs:string(name($node)) eq 'table') then | |
element { name($node) } { | |
(: changes head to title at the top of the tgroup :) | |
element { 'title' } { | |
data($node/tei:head) | |
}, | |
(: wraps thead and tbody into tgroup with approriate cols number (estimated from label row) :) | |
element { 'tgroup' } { | |
attribute { 'cols' } { count($node/tei:row[1]/tei:cell) }, | |
for $cell at $count in ($node/tei:row[1]/tei:cell) | |
return | |
(: return colspec empty element for every column, name it :) | |
element { 'colspec' } { attribute { 'colname' } { 'coll_' || $count }, () }, | |
(: makes thead from the label row :) | |
element { 'thead' } { | |
element { 'row' } { | |
for $cell in ($node/tei:row[@role='label']/tei:cell) | |
return | |
dtp-utils:prepare-tables($cell) | |
} | |
}, | |
(: makes tbody :) | |
element { 'tbody' } { | |
for $row in ($node/tei:row[not(@role)]) | |
return | |
element { 'row' } { | |
for $cell in ($row/tei:cell) | |
return | |
dtp-utils:prepare-tables($cell) | |
} | |
} | |
} | |
} | |
(: for every cell, returns an entry element and if there are any rows or colls attributes, | |
returns a morerows attribute or compute and return namest and nameend attributes :) | |
else if (xs:string(name($node)) eq 'cell' and $node//node()) then | |
element { 'entry' } { | |
if ($node/@cols) then | |
attribute { 'nameend' } { if ($node/@cols) then 'coll_' || $node/position() + $node/@cols else 'coll_' || $node/position() } | |
else (), | |
if ($node/@cols) then | |
attribute { 'namest' } { 'coll_' || $node/position() } | |
else (), | |
if ($node/@rows) then | |
attribute { 'morerows' } { if ($node/@rows) then $node/@rows else '1' } | |
else (), | |
dtp-utils:prepare-tables($node/node()) | |
} | |
(: if the cell is empty, returns nothing – CALS does not allow empty cells :) | |
else if (xs:string(name($node)) eq 'cell' and not($node//node())) then | |
() | |
(: for any other element in the document, returns it as is :) | |
else | |
element { node-name($node) } { | |
$node/@*, | |
dtp-utils:prepare-tables($node/node()) | |
} | |
(: if the $node is text(), returns it as is :) | |
case text() return | |
$node | |
(: drops any other things :) | |
default return | |
() | |
}; | |
(: Prepare for InDesign takes a document, sanitizes all Czech prepositions, | |
: dashes and § characters (puts non-breaking spaces after or before every | |
: of them), renames every head according to its level (e.g. head4) because | |
: of the necessity of difference from other heads (styling!), replaces @url | |
: with @src in all graphics, prepares footnotes for latter usage with | |
: a special script in InDesign, and converts tables into the CALS standard, | |
: which is supported by InDesign natively. :) | |
declare function dtp-utils:prepare-for-indesign($document as node()) as node() { | |
let $pass1 := cust-utils:sanitize-spaces($docu) | |
let $pass2 := dtp-utils:prepare-heads($pass1) | |
let $pass3 := dtp-utils:prepare-images($pass2) | |
let $pass4 := dtp-utils:prepare-footnotes($pass3) | |
let $pass5 := dtp-utils:prepare-tables($pass4) | |
return $pass5 | |
}; | |
(: Pack for DTP – packs necessary files in the same way as the function for packing | |
: entries for ePub. In the case there are images missing in the DB, it adds | |
: a text file with the name of the missing image and the link | |
: to the missing file is added into the body of the file. :) | |
declare function dtp-utils:pack-for-dtp($document as node(), $doc-uri as xs:string, $name as xs:string) as xs:base64Binary { | |
let $archiveName as xs:string := $name | |
let $root as xs:string := replace($doc-uri, '[^/]*?$', '') | |
let $doc-prepared := dtp-utils:prepare-for-indesign($document) | |
(: Main Document :) | |
let $doc := <entry name="files/{$name}.xml" type="xml">{$doc-prepared}</entry> | |
(: Pics :) | |
let $pics as item()* := | |
( | |
let $images := $document//tei:graphic | |
for $fileName in distinct-values($images/@url) | |
let $res := $root || 'img/hires/' || $fileName | |
return | |
if (util:binary-doc-available($res)) then | |
<entry name='files/img/{$fileName}' type='binary'>{util:binary-doc($res)}</entry> | |
else | |
<entry name='files/img/{$fileName}-url-error.txt' type='text'>Chyba v názvu souboru nebo linku: {$res}</entry> | |
) | |
let $entries as node()* := ($doc, $pics) | |
let $zip-file as item() := compression:zip($entries, true()) | |
return | |
response:stream-binary($zip-file, 'application/zip', lower-case(replace($archiveName, ' ', '-')) || '.zip') | |
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<figure xml:id="fig1"> | |
<graphic url="tealover.jpg"/> | |
<head>Illustrandum tealover</head> | |
</figure> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<figure xml:id="fig1"> | |
<graphic href="file:///img/tealover.jpg"/> | |
<head>Illustrandum tealover</head> | |
</figure> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<note place="bottom" xml:id="ftn1"> | |
<p rendition="simple:footnote:text"> | |
<hi rendition="simple:italic">Veškerý</hi> žoust je v jídle!</p> | |
</note> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<footnoteForInDesign> | |
@foot_beg@ | |
<note place="bottom" xml:id="ftn1"> | |
<pnote rendition="simple:footnote:text"> | |
<hi rendition="simple:italic">Veškerý</hi> žoust je v jídle!</pnote> | |
</note> | |
@foot_end@ | |
</footnoteForInDesign> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<table rendition="simple:rules"> | |
<head>Tabule k rozčicům</head> | |
<row role="label"> | |
<cell>půlpik</cell> | |
<cell>dolot</cell> | |
<cell>tujta</cell> | |
<cell>xorosol</cell> | |
</row> | |
<row> | |
<cell cols="2">nikdy</cell> | |
<cell>bodok | |
<note place="bottom" xml:id="ftn3"> | |
<p rendition="simple:footnote:text">Ahoj 2</p> | |
</note> | |
</cell> | |
<cell/> | |
</row> | |
<row> | |
<cell rows="2">jednou</cell> | |
<cell>jutoj</cell> | |
<cell>bodok | |
<note place="bottom" xml:id="ftn4"> | |
<p rendition="simple:footnote:text">Ahoj 3</p> | |
</note> | |
</cell> | |
<cell/> | |
</row> | |
<row> | |
<cell>jednou</cell> | |
<cell>nikdy</cell> | |
<cell>bodok | |
<note place="bottom" xml:id="ftn5"> | |
<p rendition="simple:footnote:text">Ahoj 2</p> | |
</note> | |
</cell> | |
</row> | |
</table> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<table> | |
<title>Tabule k rozčicům</title> | |
<tgroup cols="4"> | |
<colspec colname="coll_1"/> | |
<colspec colname="coll_2"/> | |
<colspec colname="coll_3"/> | |
<colspec colname="coll_4"/> | |
<thead> | |
<row> | |
<entry>půlpik</entry> | |
<entry>dolot</entry> | |
<entry>tujta</entry> | |
<entry>xorosol</entry> | |
</row> | |
</thead> | |
<tbody> | |
<row> | |
<entry nameend="coll_3" namest="coll_1">nikdy</entry> | |
<entry>bodok | |
<footnoteForInDesign>@foot_beg@ | |
<note place="bottom" xml:id="ftn3"> | |
<pnote rendition="simple:footnote:text">Ahoj 2</pnote> | |
</note> | |
@foot_end@</footnoteForInDesign> | |
</entry> | |
</row> | |
<row> | |
<entry morerows="2">jednou</entry> | |
<entry>jutoj</entry> | |
<entry>bodok | |
<footnoteForInDesign>@foot_beg@ | |
<note place="bottom" xml:id="ftn4"> | |
<pnote rendition="simple:footnote:text">Ahoj 3</pnote> | |
</note> | |
@foot_end@</footnoteForInDesign> | |
</entry> | |
</row> | |
<row> | |
<entry>jednou</entry> | |
<entry>nikdy</entry> | |
<entry>bodok | |
<footnoteForInDesign>@foot_beg@ | |
<note place="bottom" xml:id="ftn5"> | |
<pnote rendition="simple:footnote:text">Ahoj 2</pnote> | |
</note> | |
@foot_end@</footnoteForInDesign> | |
</entry> | |
</row> | |
</tbody> | |
</tgroup> | |
</table> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment