Created
March 14, 2009 20:20
-
-
Save masak/79173 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In October 2006, Damian Conway released an alpha draft of <a | |
href="http://perlcabal.org/syn/S26.html">S26</a>, the Perl 6 synopsis about | |
Pod documentation. It begins "Before Christmas, as promised!". It's large, but | |
even for an alpha, it's a good read. | |
The release of the draft spurred a whole lot of discussion on the | |
perl6-language mailing list. People seem to care a lot about these | |
things. Late November the same year, Damian posted a version of S26 without | |
the 'alpha' in its version number. He also promised to release a Pod-to-XHTML | |
module. (He eventually <a | |
href="http://search.cpan.org/~dconway/Perl6-Perldoc-v0.0.5/">did</a>, in | |
December 2007.) | |
The discussion following the release of these documents was intense and at | |
times harsh. Damian eventually declared "I'm not going to argue about the | |
design of Pod 6 any more." This was in June 2007. What was the debate about? | |
Browsing through the many emails on the matter, I find it was mainly about the | |
exact relation between Perl 6 code and its Pod comments. The vocal opposition, | |
headed (in some sense) by Mark Overmeer, argued for greater connectivity | |
between Perl 6 code and Pod comments. It's the old configurability-vs-defaults | |
discussion again, with Damian representing the configurability camp and Mark | |
the defaults camp. Damian challenged Mark to write his own S26, which Mark | |
did. Damian wrote back that he was "thinking very carefully about what Mark | |
suggested", and that he would "reply properly as soon as I am able." | |
That was in July 2007. Since then Damian has made a few cameos on the | |
perl6-language mailing list, but on matters other than Perl 6 Pod. S26 remains | |
in the state it was released in 2006, but nowadays with the message "The | |
information that is in this file is a draft specification that is known to be | |
out of date and likely to undergo some substantial revision. Until the | |
document is updated, look at STD.pm | |
(http://svn.pugscode.org/pugs/src/perl6/STD.pm) for the valid POD syntax." (<a | |
href="http://perlcabal.org/syn/S26.html">here</a>). Beware of the leopard. | |
Good thing about that reference to <a | |
href="http://svn.pugscode.org/pugs/src/perl6/STD.pm">STD.pm</a> as an | |
up-to-date source of information. Unfortunately, when you read STD.pm, what | |
you find is this, and <em>only</em> this: | |
<ecode> | |
# XXX We need to parse the pod eventually to support $= variables. | |
</ecode> | |
Essentially, S26 and Perl 6 Pod are in limbo since 2007, waiting for the | |
return of Damian (or some other heroic soul) to the scene. | |
Against this backdrop, this happened in January 2008: | |
<ecode> | |
mberends: finally pushed my first draft of perldoc pod2text pod2man Pod::Parser | |
Pod::to::text Pod::to::man to <a href="http://github.com/eric256/perl6-examples/tree/master">[github]</a> | |
mberends: please try to run it, there's more to follow (xhtml and a test suite) | |
masak: mberends: whoa! did you write that? | |
mberends: yes! over the last 3! months | |
</ecode> | |
You have to admire the courage of someone who implements a spec that's about | |
to change, in Perl 6 whose spec is also in flux, on top of Rakudo, which is | |
still being written. | |
I just asked mberends about a status report on his Pod suite. He said this: | |
<ecode> | |
mberends: masak: the low hanging fruit of common Pod is complete, including Pod5 | |
upconverter. Format codes about 50% in text, man, xhtml, pod5 and pod6 | |
emitters. =table and =use not even started. | |
</ecode> | |
What follows is my review of his Pod suite. | |
❦ | |
The Pod suite can be found <a | |
href="http://github.com/eric256/perl6-examples/tree/master/lib/Pod">here</a>. It | |
consists, for my purposes, of four parts: a test suite, the module | |
Pod::Parser, emitters to various serialization formats, and build scripts. I | |
will review them in that order. | |
<b>The tests</b> | |
The tests are somewhat awkwardly situated inside a <code>t</code> directory in | |
<code>lib/Pod</code> in the repository <a | |
href="http://github.com/eric256/perl6-examples/">perl6-examples</a>. This is | |
somewhat of a deviation from common Perl practice, but the reason is most | |
likely that the perl6-example repository hosts many different things, and a | |
<code>t</code> directory outside of <code>lib</code> might simply get lost in | |
the general commotion. | |
The test files make use of <code>Test::Differences</code>, also found in | |
<code>perl6-examples</code>, which provides a method that extends the | |
diagnostics from the standard <code>Test</code> module to include "got" and | |
"expected" output. | |
All the test files begin with a class that mocks the class being tested in | |
that file by deriving it. That particular code is repetitive, with the | |
exception of the base class, and the name of the class itself, which | |
alternately gets the names <code>Test::Parser</code> and | |
<code>TestParser</code>. This would be an excellent place for a role, putting | |
all the mocking code in one module, and just mixing it in with a one-line | |
declaration. | |
We also find the same comment in all these derived classes: | |
<ecode> | |
# Possible Rakudo bug: calling a base class method ignores other | |
# overrides in derived class, such as the above emit() redefine. | |
# workaround: redundantly copy base class method here, fails too! | |
</ecode> | |
This, after trying various one-liners, I find not to be the case. Rakudo works | |
perfectly in this case. The comment is probably stale and should be | |
removed. (This, by the way, is why I've taken to write <code>RAKUDO</code> | |
comments which reference RT ticket numbers whenever possible. This makes it | |
easy to see whether the comment has grown stale, and whether the workaround | |
can be removed.) | |
The initialization of these classes is also a bit mysterious. | |
<ecode> | |
my Test::Parser $p .= new; $p.parse_file('/dev/null'); # warming up | |
</ecode> | |
Why do we parse <code>'/dev/null'</code> here? The comment appears to want to | |
explain, but only serves to heighten the mystery. Whence the need to warm up a | |
mock object? And must we really introduce a Unix shibboleth into an otherwise | |
platform-independent test file to do it? | |
Beyond that, everything else looks fairly good and normal. The tests basically | |
throw different inputs at the modules, and check that they get the right | |
output. Perhaps a module such as <code><a | |
href="http://use.perl.org/~masak/journal/37976">Test::InputOutput</a></code> | |
could alleviate some of the repetition here, but not by much, and only at the | |
cost of another dependency. | |
I have the impression that the test coverage is actually pretty good; the 37 | |
tests do get to exercise much of the code written so far. Keep up the good | |
TDD, mberends++. | |
<b>Pod::Parser</b> | |
We now get to the centerpiece of the module suite, at this writing weighing in | |
at 33121 characters. The general structure is this: it starts with an | |
awe-inspiring grammar <code>Pod6</code>, followed by a few smaller grammars, | |
enums, and classes. After that, the class <code>Pod::Parser</code> takes up | |
the bulk of the file, followed (very appropriately) by excellent Pod | |
documentation. | |
<ecode> | |
mberends: I wish the rest of you would bother to write more comments and POD | |
</ecode> | |
The <code>Pod6</code> grammar is exquisite. Care has been taken to align | |
regexes and their parts vertically so as to bring out structure. There are no | |
empty lines, but there are one-line comments grouping the regexes into logical | |
units. | |
Two comments about the class <code>PodBlock</code>, which seems to effectively | |
be a simple C-like struct (not that there's anything wrong with that): first, | |
since all of the attributes are marked <code>is rw</code>, it would probably | |
make sense to put the <code>is rw</code> declaration on the class itself, as | |
in <a href="http://perlcabal.org/syn/S12.html#line_98">this example</a> from | |
S12. However, Rakudo <a | |
href="http://rt.perl.org/rt3/Ticket/Display.html?id=60636">doesn't yet | |
implement this</a>, which might be considered a reasonable excuse. | |
Second, this: | |
<ecode> | |
my Str $typename = defined $.typename ?? $.typename !! 'undef'; | |
my Str $style = defined $.style ?? $.style !! 'undef'; | |
</ecode> | |
...should be written thus... | |
<ecode> | |
my Str $typename = $.typename // 'undef'; | |
my Str $style = $.style // 'undef'; | |
</ecode> | |
Now for the main class <code>Pod::Parser</code>. We are greeted by a longish | |
list of attributes, alphabetically sorted, each with a half-line comment after | |
it. Such details warms the heart of a reviewer, I can tell you. The one | |
comment I couldn't get my head around is this: | |
<ecode> | |
has $!context is Context; # not (yet) in Rakudo r35960 | |
</ecode> | |
After thinking for a minute, I realize that <code>Context</code> is a | |
previously-declared enum, and that the comment probably refers to enums and | |
attributes not working properly, or some such. (Once again, a reference to an | |
RT ticket would have helped here.) Trying out enums in attributes in bleeding | |
Rakudo (commit 1dea76), I find that it works well, so the comment should go | |
away. Also, there's no need to put the type after the attribute, when all the | |
other types are written before. The line should look like this: | |
<ecode> | |
has Context $!context; | |
</ecode> | |
On the bright side, here's a comment before another attribute: | |
<ecode> | |
# $*OUT broke in r35311, reported in RT#62540 | |
</ecode> | |
Yes! That's what I'm talking about. | |
Here comes my favorite method: | |
<ecode> | |
method parse_line { # from parse_file | |
given $!line { | |
when /<Pod6::directive>/ { self.parse_directive; } # '=xx :cc' / | |
when /<Pod6::extra>/ { self.parse_extra; } # '= :cc' / | |
when /<Pod6::blank>/ { self.parse_blank; } # '' or ' ' / | |
default { if @!podblocks { self.parse_content; } # 'xx' or ' xx' | |
else { self.ambient($!line); } # outside pod | |
} | |
} | |
} | |
</ecode> | |
See what happens here? Yes, all the logic, all the grunt work, is made by the | |
<code>Pod6</code> grammar; the method basically lists the cases, and that's | |
it. It's obvious in retrospect, but I wouldn't really have thought of using | |
given/when in this case. There's even some room left to document what the | |
cases look like. Cute! | |
The method <code>parse_directive</code> seems to be a close relative of | |
<code>parse_line</code>. It would be interesting to investigate whether these | |
methods, and the methods they called, could be fit into an action class, and | |
called automatically through <code>{*}</code> directives in the grammar. Only | |
direct experimentation would answer that question. | |
An nice practice: all the methods called from <code>parse_line</code> and | |
<code>parse_directive</code> have a comment saying so. This gives some piece | |
of context, almost like declaring pre- and postconditions of a method. Since | |
these methods <em>are</em> only called from <code>parse_line</code> and | |
<code>parse_directive</code>, however, I wonder if they could not be made into | |
submethods instead. | |
In a cozy little method called <code>buf_flush</code>, we find this: | |
<ecode> | |
self.emit( ~ ( $!buf_out_line eq "\n" ?? "" !! $!buf_out_line ) ); | |
# why is the ~ necessary? | |
</ecode> | |
Now, the attribute <code>$!buf_out_line</code>, in case you are wondering, is | |
declared as a <code>Str</code>, and is set to the empty string before every | |
parse. So the question about a stringifying <code>~</code> being necessary is | |
reasonable: why isn't the expression from the trinary always a | |
<code>Str</code>? Someone should remove that <code>~</code>, find a case where | |
the trinary doesn't give a <code>Str</code>, put that case in quarantine and | |
have it sent to the good people over at RT for further classification. | |
The Pod documentation at the end of the file, as I wrote earlier, is | |
excellent. It has everything one would expect from the POD of a mature CPAN | |
module. I'm not just saying this because the Pod ends by praising me (and the | |
other November devs); it really does go an extra mile. It even contains a | |
section about how to build your own Pod serializer. It also quotes <a | |
href="http://en.wikipedia.org/wiki/Helmuth_von_Moltke_the_Elder">General | |
Helmuth von Moltke</a>, and has expansive TODO and BUGS sections. The only | |
thing I find missing is that the <code>METHODS</code> section only lists the | |
most important methods, it doesn't describe them. | |
Among the <code>BUGS</code>, I find this gem: | |
<ecode> | |
Long or complex documents randomly suffer segmentation faults. | |
</ecode> | |
As a Rakudo-using Perl 6 developer, I can empathize with that. However, I | |
believe that a few of those might actually have disappeared with the recent | |
fixes from jonathan++. | |
<b>The emitters</b> | |
The emitters, or "Podlators", all subclass <code>Pod::Parser</code>. I get | |
really good vibes from <code>Pod::to::text</code>, <code>Pod::to::pod5</code> | |
and <code>Pod::to::pod6</code>. The <code>pod6</code> emitter, however, | |
contains this: | |
<ecode> | |
$!needspace = $!needspace and (substr($content,0,1) ne " "); | |
# $!needspace &&= (substr($content,0,1) ne " "); # crashes | |
</ecode> | |
Those parentheses are not necessary, and as far as I can tell, this statement | |
no longer crashes. | |
Now, out of the four existing emitters, <code>Pod::to::xhtml</code> is the | |
most complicated one. That's perhaps not too surprising. I have two comments | |
on this code, one minor and one major: first, a few emit cases are commented | |
out with no reason given. The <code>BUGS</code> section seems to give | |
indications as to why, but the connection is not overly clear. This seems to | |
be a case where unit tests for the particular problems encountered would help. | |
Second, and more alarming, is the complete lack of XML escaping in the | |
code. When serializing to <em>any</em> XML, this is important, if one wants to | |
guarantee the validity of the XML. For a quick introduction to the pitfalls of | |
XML generation, see <a href="http://hsivonen.iki.fi/producing-xml/">this | |
article</a>. Many of the tips therein could probably be directly applied to | |
the XHTML emitter, making it more robust in the process. | |
<b>The build scripts</b> | |
mberends wrote his own <code>Configure.pm</code> in Perl 6. There are deep | |
coolness points in that; the rest of us are stuck with lame | |
<code>Makefile.PL</code> files written in Perl 5. The | |
<code>Configure.pm</code> solutions are a bit Unix-specific (for example a | |
<code>qx</code> workaround that presupposes <code>/tmp</code>), but there's | |
hardly any room to do better in Rakudo right now. Windows users haven't shown | |
up in great amounts to work on our Perl 6 projects anyway. (Funny, that. Are | |
people still using Windows these days?) | |
Here's a cutie: | |
<ecode> | |
# The opposite of slurp | |
sub squirt( Str $filename, Str $text ) { | |
my $handle = open( $filename, :w ); # should check for success | |
$handle.print: $text; | |
$handle.close; | |
} | |
</ecode> | |
Let me provide that check for success, free of charge: | |
<ecode> | |
my $handle = open( $filename, :w ) | |
or die $!; | |
</ecode> | |
(A more interesting question might be what the sub should do if the file | |
already exits. At present it simply overwrites files, which might or might not | |
be the desired behaviour.) | |
The <code>Makefile.in</code>, finally, shouldn't be remarkable in any way, but | |
it is. It contains friendly instructions how to get started with downloading, | |
reading the documentation (using, of course, the Pod tools themselves), how to | |
contact the author, and how to contribute. | |
<b>In conclusion</b> | |
All in all, this is a set of modules that mberends has put a lot of effort | |
into, and it shows. When we wrote November, we hoped that our code could serve | |
as inspiration for others -- mberends has in many cases taken our ideas and | |
run with them. His work on this suite of modules puts it on the second place | |
in complexity (after November) among the Perl 6 projects out there. As someone | |
who always delights in new Perl 6 code, I'm very happy about these modules. | |
The future of Perl 6 Pod is uncertain. By that I don't mean that I don't | |
believe it <em>has</em> a future; it's just that it's been left in the limbo | |
described in the beginning of the post. One day, Damian might return and | |
completely Change Everything. We can only hope. If and when that happens, | |
mberends++ has already given us a head start in parsing the Pod of tomorrow, | |
by providing modules that do it today. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment