Last active
April 8, 2016 11:12
-
-
Save bpj/f591a9e29fe974fa791f to your computer and use it in GitHub Desktop.
Pandoc filters (pl and py) to collect all figures and tables at a specified place in a document
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env perl | |
=pod | |
Pandoc filter which emulate the LaTeX endfloat package by extracting all | |
elements which would be LaTeX floats (figures and tables) from a | |
document and putting them in div with the id "figures" or "tables" | |
respectively. You must mark the points in the document where you want | |
the floats to go with a paragraph containing *only* the text | |
"FiguresHere" or "TablesHere" -- exactly as written here in CamelCase -- | |
or you will lose the floats! If there are several paragraphs with the | |
sentinel texts only the one first found will be replaced with a div | |
containing the figures/tables. | |
Additionally a paragraph with the text "[Figure %d about here.]" or | |
"[Table %d about here.]" is inserted into the document where the | |
figure/table used to be, with "%d" being the number of figures/tables | |
found so far; thus it is not and cannot be guaranteed to be the same | |
number as LaTeX would have assigned! | |
Reference: <https://groups.google.com/d/topic/pandoc-discuss/jLUuYFcRDtk/discussion> | |
This filter requires perl interpreter and the | |
JSON::MaybeXS and Data::Rmap modules to run. | |
Most operating systems other than Windows come with perl already installed. | |
If you are on Windows I recommend downloading and installing | |
Strawberry Perl: <http://strawberryperl.com>. | |
If/once you have perl installed run the following commands: | |
cpan App::cpanminus | |
cpanm JSON::MaybeXS Data::Rmap | |
Then run pandoc with the filter: | |
pandoc -F ./pandoc-collect-floats.pl [OPTIONS] INPUTFILE | |
=cut | |
use utf8; # so literals and identifiers can be in UTF-8 | |
use strict; # quote strings, declare variables | |
use warnings; # on by default | |
use JSON::MaybeXS qw[ decode_json encode_json ]; | |
use Data::Rmap qw[ rmap_hash ]; | |
my $format = shift @ARGV; | |
my $json = do { local $/; <>; }; | |
my $doc = decode_json( $json ); | |
my %floats = ( # | |
figures => [], | |
saw_figures => 0, | |
tables => [], | |
saw_tables => 0, | |
); | |
rmap_hash { | |
return unless exists $_->{t} and exists $_->{c}; | |
my $elem = $_; | |
if ( 'Para' eq $elem->{t} ) { | |
return unless 1 == @{ $elem->{c} }; | |
if ( 'Image' eq $elem->{c}[0]{t} ) { | |
return unless $elem->{c}[0]{c}[-1][1] =~ /^fig\:/; | |
push @{ $floats{figures} }, $elem; | |
my $count = @{ $floats{figures} }; | |
$_ = +{ | |
t => 'Para', | |
c => [ +{ t => 'Str', c => "[Figure $count about here.]" } ], | |
}; | |
} | |
elsif ( 'Str' eq $elem->{c}[0]{t} ) { | |
return unless $elem->{c}[0]{c} =~ /^(Figures|Tables)Here$/; | |
my $id = lc "collected-$1"; | |
$_ = +{ t => 'Div', c => [ [ $id, [], [] ], $floats{$id} ], }; | |
} | |
} | |
elsif ( 'Table' eq $elem->{t} ) { | |
push @{ $floats{tables} }, $elem; | |
my $count = @{ $floats{tables} }; | |
$_ = +{ | |
t => 'Para', | |
c => [ +{ t => 'Str', c => "[Table $count about here.]" } ], | |
}; | |
} | |
return; | |
} | |
$doc; | |
print encode_json( $doc ); | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
""" | |
Pandoc filter which emulate the LaTeX endfloat package by extracting all | |
elements which would be LaTeX floats (figures and tables) from a | |
document and putting them in div with the id "figures" or "tables" | |
respectively. You must mark the points in the document where you want | |
the floats to go with a paragraph containing *only* the text | |
"FiguresHere" or "TablesHere" -- exactly as written here in CamelCase -- | |
or you will lose the floats! If there are several paragraphs with the | |
sentinel texts only the one first found will be replaced with a div | |
containing the figures/tables. | |
Additionally a paragraph with the text "[Figure %d about here.]" or | |
"[Table %d about here.]" is inserted into the document where the | |
figure/table used to be, with "%d" being the number of figures/tables | |
found so far; thus it is not and cannot be guaranteed to be the same | |
number as LaTeX would have assigned! | |
Reference: <https://groups.google.com/d/topic/pandoc-discuss/jLUuYFcRDtk/discussion> | |
This filter requires the pandocfilters module to be installed. You can | |
clone or download it from GitHub (with instructions for installing and | |
how to use filters): https://github.com/jgm/pandocfilters or install | |
from PyPI:: | |
pip install pandocfilters | |
If you have an earlier version installed you may need to do:: | |
pip install -U pandocfilters | |
""" | |
from pandocfilters import toJSONFilter, Div, Image, Para, Str, Table | |
floats = { | |
'figures': [], | |
'saw_figures': None, | |
'tables': [], | |
'saw_tables': None | |
} | |
def collect_floats(eltype, eldata, fmt, meta): | |
global floats | |
if eltype == 'Para': | |
if len(eldata) != 1: | |
return None | |
elem = eldata[0]; | |
if elem['t'] == 'Image': | |
if elem['c'][-1][1].startswith('fig:'): # title | |
floats['figures'].append(Para(eldata)) | |
filler = "[Figure %d about here.]" % len(floats['figures']) | |
return Para([Str(filler)]) | |
elif elem['t'] == 'Str': | |
text = elem['c'] | |
if elem['c'] == 'FiguresHere': | |
if floats['saw_figures']: | |
return None | |
floats['saw_figures'] = True | |
key = 'figures' | |
elif elem['c'] == 'TablesHere': | |
if floats['saw_tables']: | |
return None | |
floats['saw_tables'] = True | |
key = 'tables' | |
else: | |
return None | |
return [Div(['collected-' + key , [], []], floats[key])] | |
elif eltype == 'Table': | |
floats['tables'].append(Table(*eldata)) | |
filler = "[Table %d about here.]" % len(floats['tables']) | |
return Para([Str(filler)]) | |
return None | |
if __name__ == "__main__": | |
toJSONFilter(collect_floats) | |
I upgraded to the newest pandoc 1.16.0.2 and this is not working anymore.
pandoc --verbose --bibliography=ASMOptim.bib --filter pandoc-citeproc
--filter pandoc-collect-floats.py ASMOptim.pandoc.tex -o
ASMOptim_0.2.0...docx #
Traceback (most recent call last):
File "/Users/rainerkrug/bin/pandoc-collect-floats.py", line 76, in
<module>
toJSONFilter(collect_floats)
File "/usr/local/lib/python2.7/site-packages/pandocfilters.py", line 63,
in toJSONFilter
altered = walk(doc, action, format, doc[0]['unMeta'])
File "/usr/local/lib/python2.7/site-packages/pandocfilters.py", line 31,
in walk
array.append(walk(item, action, format, meta))
File "/usr/local/lib/python2.7/site-packages/pandocfilters.py", line 22,
in walk
res = action(item['t'], item['c'], format, meta)
File "/Users/rainerkrug/bin/pandoc-collect-floats.py", line 50, in
collect_floats
if elem['c'][1][1].startswith('fig:'): # title
AttributeError: 'dict' object has no attribute 'startswith'
pandoc: Error running filter pandoc-collect-floats.py
Filter returned error status 1
I have no idea where to look - if you could take a look and possibly make
it compatible with pandoc 1.16?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have updated the filters to
Unfortunately it is not possible to make the python version just automatically put the figures and tables at the end of the document, since the pandocfilters toJSONFilter() function doesn't give you access to the whole document data structure. It would have been possible to make the perl version behave like that, but I want to keep the versions analogous.
So now these filters will make a pandoc markdown document like this:
will become like this: