Last active
November 13, 2016 20:15
-
-
Save bpj/cf324c9c6bf9e59ad0e810a23b2bca8f to your computer and use it in GitHub Desktop.
steal-attrs.pl - Let HTML elements steal their ancestor's attributes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env perl | |
=encoding UTF-8 | |
=head1 NAME | |
steal-attrs.pl - Let HTML elements steal their ancestor's attributes. | |
=head1 VERSION | |
0.001 | |
=head1 SYNOPSIS | |
perl steal-attrs.pl -r 'SELECTOR=:SELECTOR' ... HTML-FILE ... | |
=cut | |
# More documentation after the code! | |
use utf8; | |
use 5.010001; | |
use strict; | |
use warnings; | |
use warnings qw(FATAL utf8); | |
use open qw(:std :utf8); | |
use Getopt::Long qw[ GetOptionsFromArray :config no_ignore_case ]; | |
my @select; | |
my $opt_cb = sub { | |
push @select, [ $_[0], split /\s*=:\s*/, $_[1] ]; | |
}; | |
GetOptionsFromArray( | |
\@ARGV, | |
'keep|k=s' => $opt_cb, | |
'replace|r=s' => $opt_cb, | |
'tag|t=s' => $opt_cb, | |
); | |
use Mojo::DOM; | |
my $dom = Mojo::DOM->new(join "", <>); | |
for my $item ( @select ) { | |
my($type, $lhs, $rhs) = @$item; | |
my $parents = $dom->find($lhs); | |
PARENT: | |
for my $parent ( @$parents ) { | |
if ( 'tag' eq $type ) { | |
$parent->tag($rhs); | |
next PARENT; | |
} | |
my $children = $parent->find($rhs) || next PARENT; | |
for my $child ( @$children ) { | |
%$child = %$parent; # attributes | |
} | |
if ( 'replace' eq $type ) { | |
$parent->replace( $parent->content ); | |
} | |
} | |
} | |
print $dom; | |
__END__ | |
# # DOCUMENTATION # # | |
=head1 DESCRIPTION | |
C<< steal-attrs.pl >> is a perl script which lets an HTML element steal | |
its ancestor's attributes. I use it to move attributes from wrapping div | |
and span elements to some descendant element in HTML emitted by the | |
markup converter L<< pandoc|http://pandoc.org >> which for all its | |
excellence supports attributes only on a few element types in its | |
L<< Markdown|https://en.wikipedia.org/wiki/Markdown >> dialect. Usually | |
it is not needed -- you can just select the child(ren) of an element | |
with attributes in your | |
L<< CSS|https://en.wikipedia.org/wiki/Cascading_Style_Sheets >>, but | |
sometimes another tool expects to find attributes on some other element, | |
and this is where this script comes in handy. | |
B<< Note >>: this script is I<< not >> a pandoc filter. It filters HTML | |
output by pandoc or created in some other way. | |
Input is taken from STDIN and output is given to STDOUT. All input must | |
be UTF-8 encoded, as is all output. | |
It works by letting you specify two CSS selectors, one which specifies | |
the element(s) which originally have the attributes, and one which | |
specifies the descendant element(s) which the attributes should move to, | |
relative to the first element. See | |
L<< Mojo::DOM::CSS|Mojo::DOM::CSS/"SELECTORS" >> for the supported | |
selector types. Selectors are specified on the command line as arguments | |
to the quasi-options C<< --replace >> (or C<< -r >>) and C<< --keep >> | |
(or C<< -k >>), which mean that the original host of the attributes | |
should either be replaced by its content or kept. The selectors | |
themselves are specified I<< as a single shell string >> with the | |
ancestor selector and the descendant selector separated by the special | |
character pair C<< =: >>, which was choosen because it isn't special to | |
CSS or the shell. Thus to take the attributes of a div with the id | |
C<< foo >>, give the same attributes to any descendant tables and then | |
replace the div with its now altered content you might say: | |
perl steal-attrs.pl -r 'div#foo =: table' mydoc.html | |
When filtering the output from pandoc or some other tool there is of | |
course no need to create an intermediate file: | |
pandoc mydoc.md | perl steal-attrs.pl -r 'div#foo =: table' >mydoc.html | |
If you instead want to replace the I<< tag >> of some elements you can | |
use the pseudo-option C<< --tag >> (or C<< -t >>), in which case the | |
substring after C<< =: >> must not be a CSS selector but the tag which | |
should replace the original one. Thus to convert all span elements with | |
a class C<< small-caps >> into em elements you say: | |
perl steal-attrs.pl -t 'span.small-caps =: em' mydoc.html | |
=head1 PREREQUISITES | |
=over | |
=item * | |
Getopt::Long | |
=item * | |
Mojo::DOM | |
=item * | |
perl 5.10.1 | |
=back | |
=head2 Get perl | |
L<< https:E<0x2f>E<0x2f>www.perl.orgE<0x2f>get.html|https://www.perl.org/get.html >> | |
(For Windows I recommend Strawberry Perl.) | |
=head2 Installing perl modules | |
L<< http:E<0x2f>E<0x2f>www.cpan.orgE<0x2f>modulesE<0x2f>INSTALL.html|http://www.cpan.org/modules/INSTALL.html >> | |
=head1 AUTHOR | |
Benct Philip Jonsson ([email protected], | |
L<< https:E<0x2f>E<0x2f>github.comE<0x2f>bpj|https://github.com/bpj >>) | |
=head1 COPYRIGHT | |
Copyright 2016- Benct Philip Jonsson | |
=head1 LICENSE | |
This is free software; you can redistribute it andE<0x2f>or modify it | |
under the same terms as the Perl 5 programming language system itself. | |
See | |
L<< http:E<0x2f>E<0x2f>dev.perl.orgE<0x2f>licensesE<0x2f>|http://dev.perl.org/licenses/ >>. | |
=cut | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment