Created
March 11, 2017 17:47
-
-
Save bpj/93aa60a3dbda96cdbd2cd91ba07a36c7 to your computer and use it in GitHub Desktop.
Pandoc filter which 'converts' span and div classes with a trailing period into LaTeX commands/environments or DOCX styles
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env perl | |
=encoding UTF-8 | |
=head1 DOCUMENTATION | |
# DESCRIPTION | |
Pandoc filter which 'converts' span and div classes with a | |
trailing period into LaTeX commands (for spans) or environments | |
(for divs) or to DOCX character and paragraph styles respectively. | |
When the output format is not "latex" or "docx" it instead | |
removes the 'decorations'. | |
Note that you must make sure that any non-standard LaTeX commands or | |
environments are defined and/or nececessary packages loaded in your | |
source when running latex, and that any custom styles are | |
defined in your reference docx. | |
The filter looks for Span and Div elements which have one or | |
more classes consisting only of letters and ending with a period. | |
When the output format is "latex" it wraps the elements in raw | |
LaTeX code so that the command or environment name is the class | |
name minus the trailing dot. You can specify multiple classes | |
with a trailing period; the span or div will be wrapped in as | |
many commands or environments as there are matching classes, with | |
the first class becoming the outermost command or environment and | |
the last becoming the innermost, with any other matching classes | |
coming inbetween in order. | |
When the output format is "docx" the original element gets a *custom- | |
style* attribute with the concatenated names of the matching | |
classes, minus the trailing periods and with the first letter of | |
each class uppercased, as value. This will cause pandoc to assign | |
a character or paragraph style with the value as name to the | |
enclosed text or paragraphs. That's a bit lame but it is the best | |
we can do since named styles aren't additive. Look for "Custom | |
Styles in Docx Output" in the Pandoc manual for an explanation. | |
# EXAMPLES | |
Markdown: | |
[inscriptio]{.textsc.} | |
LaTeX: | |
\textsc{{inscriptio}} | |
HTML: | |
<p><span class="textsc">inscriptio</span></p> | |
---- | |
Markdown: | |
--- | |
header-includes: | |
- \usepackage{framed} | |
- \usepackage{color} | |
... | |
[underlined sans]{.uline. .textsf.} | |
<div class="center."> | |
*I'm centered!* | |
</div> | |
<div class="minipage. center."> | |
*Centered **and** enclosed!* | |
</div> | |
LaTeX: | |
\uline{\textsf{{underlined sans}}} | |
\begin{center} | |
\emph{I'm centered!} | |
\end{center} | |
\begin{framed} | |
\begin{center} | |
\emph{Centered \textbf{and} enclosed!} | |
\end{center} | |
\end{framed} | |
\newcommand{\BlueText}[1]{\textcolor{blue}{#1}} | |
\BlueText{{I'm blue!}} | |
I can't give an example of docx output, but the character styles Textsc, UlineTextsf and BlueText, and the paragraph styles Center and FramedCenter will exist in the produced docx file, waiting for you to change them, and the look of the appropriate elements will change accordingly. When you have done that once you can use the modified file as `--reference-docx=modified.docx` on subsequent runs. | |
=cut | |
use utf8; | |
use autodie 2.29; | |
use 5.010001; | |
use strict; | |
use warnings; | |
use warnings qw(FATAL utf8); | |
use Carp qw[ carp croak ]; | |
use Pandoc::Elements 0.33; | |
use Pandoc::Walker 0.27 qw[ action transform ]; | |
my $out_format = shift @ARGV; | |
my $json = <>; | |
my $doc = pandoc_json($json); | |
my $class_re = qr/(?<!\S)(\pL+)\.(?!\S)/; | |
my %actions = 'latex' eq $out_format | |
? ( | |
'Span' => sub { # { for poor editor | |
state $end_cmd = RawInline latex => '}'; | |
my($elem, $action) = @_; | |
my @commands = $elem->class =~ /$class_re/g; | |
return unless @commands; | |
transform( $elem->content, $action, $action); | |
my @ret = $elem; | |
for my $com ( reverse @commands ) { | |
unshift @ret, RawInline latex => "\\$com\{"; | |
push @ret, $end_cmd; | |
} | |
return \@ret; | |
}, | |
'Div' => sub { | |
my($elem, $action) = @_; | |
my @envs = $elem->class =~ /$class_re/g; | |
return unless @envs; | |
transform( $elem->content, $action, $action); | |
my @ret = $elem; | |
for my $env ( reverse @envs ) { | |
unshift @ret, RawBlock latex => "\\begin\{$env\}"; | |
push @ret, RawBlock latex => "\\end\{$env\}"; | |
} | |
return \@ret; | |
}, | |
) | |
: 'docx' eq $out_format | |
? ( | |
'Span|Div' => sub { | |
my($elem, $action) = @_; | |
my @styles = $elem->class =~ /$class_re/g; | |
return unless @styles; | |
transform( $elem->content, $action, $action); | |
my $style = join "", map {; ucfirst $_ } @styles; | |
$elem->attr( attributes +{'custom-style' => $style} ); | |
return $elem; | |
}, | |
) | |
# some other $out_format | |
: ( | |
'Span|Div' => sub { | |
my($elem, $action) = @_; | |
my $classes = $elem->class; | |
return unless $classes =~ s/$class_re/$1/g; | |
$elem->class($classes); | |
transform($elem->content, $action, $action); | |
return $elem; | |
}, | |
); | |
my $action = action \%actions; | |
# Allow applying the action recursively | |
$doc->transform($action, $action); | |
print $doc->to_json; | |
__END__ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment