pandoc-class2style.pl - filter to translate single pandoc classes into attribute lists or LaTeX commands
1.000
pandoc -F pandoc-class2style.pl ...
pandoc-class2style.pl is a Pandoc filter which lets you use spans (or divs) with a single class in your source document and have the necessary LaTeX markup, DOCX custom styles, or HTML attributes of your choice injected during conversion. You still have to wrap the 'special' text in a span or div but since you only need to mark each span with a class with as few letters as you want the source becomes much less cluttered. It also becomes much easier to produce multiple formats from the same Markdown source.
You declare a mapping from short classes to LaTeX commands or environments, DOCX custom styles or HTML attributes in your YAML metadata as follows:
---
class2style:
latex:
u: uline
uu: uuline
grc: textgreek[variant=ancient]
he: texthebrew
la: textlatin
sc: textsc
blue: textcolor{blue}
docx:
- u: Underlined
uu: DoubleUnderlined
grc: Greek
he: Hebrew
la: Latin
sc: SmallCaps
- blue
html:
u:
class: uline
uu:
class: uuline
grc:
lang: grc
he:
lang: he
dir: rtl
la:
lang: la
sc:
class: small-caps
lang: en
otherlangs:
- grc
- he
- la
mainfont: FreeSerif # or any other font you prefer
xcolor: hyperref, svgnames
...
[Underlined]{.u} [Double underlined]{.uu}
[Ἑλληνιστής]{.grc}
[עִבְרִית]{.he}
[Lingua Romanica]{.la .sc}
[I'm *blue*!]{.blue}
Running pandoc with this filter gives the following outputs for the above:
pandoc -F pandoc-class2style.pl c2stest.md -t latex
:
\uline{Underlined} \uuline{Double underlined}
\textgreek[variant=ancient]{Ἑλληνιστής}
\texthebrew{עִבְרִית}
\textlatin{\textsc{Lingua Romanica}}
\textcolor{blue}{I'm \emph{blue}!}
pandoc -F pandoc-class2style.pl c2stest.md -t html5
:
<p><span class="uline">Underlined</span>
<span class="uuline">Double underlined</span></p>
<p><span lang="grc">Ἑλληνιστής</span></p>
<p><span lang="he" dir="rtl">עִבְרִית</span></p>
<p><span class="small-caps" lang="la">Lingua Romanica</span></p>
<p><span class="blue">I'm <em>blue</em>!</span></p>
Finally I can't show the DOCX output here, but it is as if the Markdown had been like this:
[Underlined]{custom-style="Underlined"}
[Double underlined]{custom-style="DoubleUnderlined"}
[Ἑλληνιστής]{custom-style="Greek"}
[עִבְרִית]{custom-style="Hebrew"}
[Lingua Romanica]{custom-style="LatinSmallCaps"}
[I'm *blue*!]{custom-style="Blue"}
I originally had three different filters for each of LaTeX, HTML and DOCX with essentially the same interface. When I combined them to make maintenance and configuration easier it was a bit of a problem what to call the combined filter. In the end I decided to use style as the most general term, qualified as follows:
The word 'style' in scare quotes means any of the AST modifications performed by this filter in order to affect how elements with certain classes are rendered in any of the supported output formats. It does thus not necessarily refer to a DOCX style as applied through Pandoc's custom-style
attribute. In particular it does not refer to the HTML style
attribute. It is best practice to avoid that attribute and apply CSS styles through tag, class, id and attribute selectors in a separate style sheet. When talking about CSS the phrase CSS style is used.
Similarly the word custom-style
, hyphenated but sometimes without code formatting is used when talking about the custom-style
attribute which tells Pandoc's docx writer to apply a particular named DOCX style to the contents of a span or div. Finally the phrase DOCX style is used for the named styles which you can define, modify and apply to text elements in a word processor.
In LaTeX mode 'styles' applied to spans become commands and 'styles' applied to divs become environments. This is not configurable. I have experimented with configuring this in the past and my experience wasn't good. If you really want to try to use a command as an environment you can try the environ package.
Similarly DOCX custom-style
s become character styles for spans and paragraph styles for divs. This is part of Pandoc's built-in custom-style
feature.
Also note what was said on namespaces below!
If you apply several classes with associated styles to the same span or div they are combined.
In LaTeX mode the commands and environments are nested. The left-to-right order of the classes in the source is preserved, so that [foo]{.bar .baz}
becomes \bar{\baz{foo}}
but [foo]{.baz .bar}
becomes \baz{\bar{foo}}
. Similarly environments are nested with the one corresponding to the leftmost class becoming outermost and the one corresponding to the rightmost class becoming innermost.
Because DOCX named styles aren't additive things become a little more complicated. Multiple class 'styles' become concatenated with the first letter of each component style capitalized, as seen in the LatinSmallCaps
example. You will need to define each such combined style in your reference-docx. At least you can let your SmallCaps
style inherit from the built-in Small Caps
style and your LatinSmallCaps
style inherit from your SmallCaps
style so that changes in the ancestor styles get reflected in the descendant styles.
Note that since there can only be one 'style' per class and output format you need to use a separate class for each LaTeX command or environment or for each DOCX character or paragraph style.
The one-style-per-class behavior is consistent with how things work in LaTeX where commands and environments share a namespace, and DOCX where character and paragraph styles also share the same namespace. If this bothers you when producing HTML remember that nothing stops you from defining HTML 'styles' with the same attributes, including classes, corresponding to different input classes. You can even use the YAML anchor--reference syntax to reduce typing, file size and errors:
class2style:
latex:
he: texthebrew
he-block: hebrew
docx:
he: Hebrew
he-block: HebrewPara
html:
he: &hebrew
lang: he
dir: rtl
he-block: *hebrew
Here *hebrew
is a reference which causes the value of the key html-->he-block
to be the same as the value of the key html-he
which is marked with the anchor &hebrew
.
I don't know which of Pandoc and/or LibreOffice and/or Word imposes the limitation that DOCX paragraph and character styles can't have the same name, which is a little strange given the separation between those two kinds of styles.
The LaTeX namespace limitation is due to the fact that the LaTeX implementation of an environment foo
involves defining the commands \foo
and \endfoo
. Why it was called \foo
and not \beginfoo
is anybody's guess...
By default 'styles' are only applied when the output format is one of latex
, docx
, html
, html5
or epub
. You can override this by setting one of the metadata variables class2style_html
or class2style_docx
to a true value on the command line.
In fact you can run with any output format and make this filter behave as if the output format had been html
or docx
. Just say:
$ pandoc -F pandoc-class2style.pl -t markdown -M class2style_html ...
$ pandoc -F pandoc-class2style.pl -t markdown -M class2style_docx ...
There is no similar variable for LaTeX because Markdown markup inside the wrapped spans and divs will be broken if latex-mode output is converted to Markdown.
When applying 'styles' to markdown output you may wish to assign both HTML attributes and DOCX custom-style
attributes at the same time. There is an easy workaround for this: just include a "custom-style" attribute in your class2style-->html-->CLASS
metadata mapping and run with the -M class2style_html
switch on the command line.
class2style:
html:
sc:
class: 'small-caps'
'custom-style': 'Small Caps'
By default the existing classes of a span or div element which gets new arguments associated with it are deleted. This is so that you don't get any duplicated attributes if you first run the filter when producing Markdown output and then at a later time run the filter on the same document again, e.g. to also apply 'styles' to elements added later. This behavior can be overridden by passing the switch -M class2style_keep
on the command line.
Sometimes you need to pass extra arguments to a LaTeX command or environment. If those arguments come before the main argument(the one containing the span content) you can generally include it in your command line string as in the Blue: textcolor{blue}
example; anything you put as COMMAND
in your CLASS: COMMAND
metadata field will be put into the frame \...{
and prepended to the span content as a raw latex string. In the rare cases where you need to put arguments after the span content argument you can replace COMMAND
with a mapping with the two keys before
and after
:
CLASS:
before: BEFORE
after: AFTER
In this case BEFORE
will be put into the same \...{
frame before the content and AFTER
will be put into a }...
frame after the content, giving you \BEFORE{CONTENT}AFTER
.
With environments (i.e. divs) you always need a mapping with the two keys name
and args
to pass arguments, with the value of name
being the environment name and the value of args
being the argument string:
---
class2style:
latex:
grc-block:
name: greek
args: '[variant=ancient]'
...
<div class="grc-block">
| Ἄφοβον ὁ θεός,
| ἀνύποπτον ὁ θάνατος
| καὶ τἀγαθὸν μὲν εὔκτητον,
| τὸ δὲ δεινὸν εὐεκκαρτέρητον
</div>
which thus becomes
\begin{greek}[variant=ancient]
Ἄφοβον ὁ θεός,\\
ἀνύποπτον ὁ θάνατος\\
καὶ τἀγαθὸν μὲν εὔκτητον,\\
τὸ δὲ δεινὸν εὐεκκαρτέρητον
\end{greek}
In all these cases you may need to quote your values so that they don't confuse the YAML parser or Pandoc's Markdown parser which both will have a go at the values before the filter sees them. You may even have to wrap values containing LaTeX code both in outer single quotes for YAML and in inner backticks for Pandoc to ensure that they come intact to the filter:
class2style:
latex:
foo: '`framebox[1.1\width]`'
In fact you can write e.g. '`\uline{`'
. No extra backslash or opening brace will be added if you do, but then the twofold quoting is absolutely necessary.
Note that you will have to declare a separate class for each combination of command or environment and extra arguments. I have experimented with specifying custom arguments as attributes to a span or div in the past and in general it leads to cluttered Markdown source and complicated filter code with concomitant risk for errors. Even though the one class--one combination of command and arguments approach might mean more declarations in your metadata it keeps the body of your document cleaner. If the volume of the metadata declarations bother you remember that you can put metadata blocks anywhere, and that they are less in the way at the end of the file.
This filter also works on inline code and code blocks.
As you may have noticed the value of the docx
key in our initial example is a list of strings and mappings. This can be done with any output format. String list items will be expanded into a single-element mapping STRING: STRING
, and then the list of mappings will be flattened into a single mapping, with later elements overriding earlier elements with the same key.
Finally you can in some cases forgo of the metadata declaration and instead append a period at the end of a class name. This will result in a command, environment, HTML class or DOCX style where the name is equal to the class name without the trailing period.
[Framed]{.fbox.}
\fbox{Framed}
<p><span class="fbox">Framed</span></p>
[Framed]{custom-style="Fbox"}
In addition to Pandoc this filter requires the following perl modules:
Carp
Pandoc::Elements 0.33
Pandoc::Walker 0.27
autodie 2.29
perl 5.010001
strict
warnings
This filter requires perl (minimum version as given above) and the Perl modules listed above to function. If you haven't used Perl before information on how to get/install perl and/or Perl modules can be found at the URLS below, which lead to the official information on these topics.
Don't worry! If your operating system is Linux or Mac you probably already have a new enough version of perl installed. If you don't or if your operating system is Windows it is easy to install a recent version, and once you have perl installed installing modules is very easy. Just follow the instructions linked to below.
Getting perl https://www.perl.org/get.html
(For Windows I recommend Strawberry Perl as module installation is easier there.)
Installing Perl modules http://www.cpan.org/modules/INSTALL.html
Benct Philip Jonsson ([email protected], https://github.com/bpj)
Copyright 2017- Benct Philip Jonsson
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself. See http://dev.perl.org/licenses/.