Skip to content

Instantly share code, notes, and snippets.

@run4flat
Created January 24, 2012 14:21
Show Gist options
  • Save run4flat/1670400 to your computer and use it in GitHub Desktop.
Save run4flat/1670400 to your computer and use it in GitHub Desktop.
Testing POD
use strict;
use warnings;
our @all_listings;
my %actions;
# Import the names of the main listings and run the command or print help
sub PDL_Checker::import {
shift;
@all_listings = @_;
# If no args, print out the usage message:
if (@ARGV == 0 or not exists $actions{$ARGV[0]}) {
print <<USAGE_MESSAGE;
$0, a chapter for the PDL::Book
To... ...type
check the pod perl $0 check
view in perldoc perldoc $0
perl $0 view
generate html perl $0 html
generate pdf perl $0 pdf
show main listings perl $0 list-main
show full listing perl $0 show-listing <name>
run listing perl $0 run-listing <name>
run all main listings perl $0 run-main
USAGE_MESSAGE
}
else {
$actions{$ARGV[0]}->();
}
}
# Check that pod is valid
$actions{check} = sub {
my $status = system('podchecker', $0);
exit unless $status == 0;
};
# View in perldoc
$actions{view} = sub {
system('perldoc', $0);
};
# Generate HTML
$actions{html} = sub {
(my $outfile = $0) =~ s/.pod/.html/;
system('pod2html', "--infile=$0", "--outfile=$outfile");
};
# Generate PDF
$actions{pdf} = sub {
(my $outfile = $0) =~ s/.pod/.pdf/;
system('pod2pdf', $0, qw(--icon-scale 0.25 --icon logo2.png --title)
, 'PDL Book', '--output-file', $outfile);
};
# Print the names of the main listings
$actions{'list-main'} = sub {
print "Main listings:\n";
foreach (@all_listings) {
print "\t$_\n";
}
};
# Print a listing to the screen
$actions{'show-listing'} = sub {
my $filename = $0;
my $name = $ARGV[1];
die "You must give a listing name\n" unless defined $name;
# Gather all the listings:
my ($listings, $depends) = collect_listings($filename);
my $listing = build_listing($depends, $listings, $name);
print $listing;
exit;
};
# Generate the file(s) needed for a listing
$actions{'gen-listing'} = sub {
my $filename = $0;
my $name = $ARGV[1];
die "You must give a listing name\n" unless defined $name;
# Build the listing:
my ($listings, $depends) = collect_listings($filename);
my $listing = build_listing($depends, $listings, $name);
# Save it to a file and run it:
open my $script_fh, '>', "$name.pl";
print $script_fh $listing;
close $script_fh;
};
# Run listings
$actions{'run-listing'} = sub {
$actions{'gen-listing'}->();
my $name = $ARGV[1];
# Run it and exit
system('perl', "$name.pl");
};
# Run all the main listings
$actions{'run-main'} = sub {
foreach (@all_listings) {
$ARGV[1] = $_;
print '=' x 20, $_, '=' x 20, "\n";
$actions{'run-listing'}->();
}
};
sub collect_listings {
my $filename = shift;
my %listings;
my %depends;
# Open the filename and collect all the listings:
open my $fh, '<', $filename;
while(<$fh>) {
if (/^=for listing (\S+)( \[\S+\])?$/) {
my $code_name = $1;
my $code_dep = $2;
# Check that the next line is blank or 'internal'
my $line = <$fh>;
warn "First line after listing $code_name must be blank\n"
unless $line =~ /^$/ or $line =~ /internal/;
# Process dependencies
if (defined $code_dep) {
# Trim spaces and brackets:
$code_dep =~ s/[ \[\]]//g;
# Split and add the list to the hash:
$depends{$code_name} = [split(/,/, $code_dep)];
}
# Accumulate the listing
while($line = <$fh>, $line =~ /^ /) {
# Remove first space
$line =~ s/^ //;
# Pull out ' perldl> '
$line =~ s/^\s+pdl\>//;
# Make sure __END__ et al. are flush against the left edge:
$line =~ s/^\s+(__.*__)/$1/;
$listings{$code_name} .= $line;
}
}
}
return \%listings, \%depends;
}
sub build_listing {
my ($dep_hash, $code_hash, $code_name) = @_;
my $sub_listing = '';
die "Unknown listing $code_name\n" unless exists $code_hash->{$code_name};
if (exists $dep_hash->{$code_name}) {
foreach my $dep_name (@{$dep_hash->{$code_name}}) {
# Include all the dependencies' dependencies:
$sub_listing .= build_listing($dep_hash, $code_hash, $dep_name);
}
}
# Include this dependency
$sub_listing .= $code_hash->{$code_name};
# Clear the dependency so it is not included twice:
$code_hash->{$code_name} = '';
return $sub_listing;
}
1;
__END__
=head1 NAME
PDL_Checker - a module to help automate PDL::Book production
=head1 SYNOPSIS
In your docs:
use PDL_Checker;
=head1 My Docs
...
=cut
Then from the command-line, this will print out instructions:
> perl my-docs.pod
This will check the pod for you:
> perl my-docs.pod check
This will run an example listing:
> perl my-docs.pod run-listing foobar
=head1 USAGE
This module simplifies the checking of your documentation's POD formatting
and, in particular, lets you verify that your example code works as
advertised.
=head2 Checking POD
When you add C<use PDL_Checker> at the top of your POD file, you enable your
documentation to be run directy with Perl to check the pod and generate
HTML and PDF output. If your document is called C<my-doc.pod>, you could
type at the command line
> perl my-doc.pod check
and it will run podchecker for you. You can also run
> perl my-doc.pod view
to view the document using perldoc,
> perl my-doc.pod html
to generate HTML-formatted output for your doc, and
> perl my-doc.pod pdf
to generate PDF-formatted output for your doc, if you have the pod2pdf
formatter.
=head2 Checking Code Listings
PDL_Checker lets you check your example listings with the command
> perl my-doc.pod run-listing <name>
where C<< <name> >> is the name of your listing. Listing are specially
annotated code sections in your pod with syntax that looks like this:
=for listing <name> [<dependencies>]
<blank-line>
<indented code>
<indented code>
Here is a hello world listing:
=for listing hi
print "Hello, world!\n";
If that was in C<my-doc.pod>, you could then run
> perl my-doc.pod run-listing hi
and it should print out the well-worn message.
You can also split your listing into multiple parts and include them as
dependencies:
=for listing hi-and-bye [hi]
print "Well, gotta run!\n";
Running that should look like this:
> perl my-doc.pod run-listing hi-and-bye
Hello, world!
Well, gotta run!
You can include multiple dependncies, and your dependencies will be properly
handled recursively.
There is a special notation for C<< pdl> >> prompts. Any lines in your
listings that start with whitespace followed by the text 'pdl>' will be
stripped out. Although the output may not be one-to-one, it should be close
enough for testing purposes.
One last point. If you want to include a listing that you don't want shown
as pod, you can put C<internal> in the first line after the C<=for> directive.
The listing will still be testable and it will still be something that
others can depend on. As such, it can serve as a good way to tie together a
bunch of otherwise unrelated listings, or as boilerplate.
=cut
=head1 AUTHOR
David Mertens C<[email protected]>

The PDL PreProcessor

The PDL PreProcessor, or PDL::PP, is PDL's secret weapon. With PDL::PP, you get support for bad values. Because of PDL::PP, you can easily generate code that threads over extra dimensions. Slices operate as first-class piddles in your Perl code because they are first-class piddles in PDL::PP code.

Before getting any further, I want to acknowledge that Tuomas J. Lukka, Karl Glaazebrook, and Christian Soeller are (to the best of my knowledge) the original architects of PDL::PP. Contributions have come from others. My own work in the code of PDL::PP has been minimal, mostly working on bug fixes, better error reporting, and general code cleanup. However, I have used PDL::PP extensively and have learned quite about about the nuances of using it. I hope to expand this introduction in the coming months to include many of the insights that I have gained through my use and analysis of PDL::PP. However, at the moment it serves only as an introduction to the topic. After reading this, you should have a firm grasp on the basics of using PDL::PP and the full documentation should be fairly easy to follow.

Note that the vast majority of these examples are tested and should work by simply pasting them directly into a text editor. The only correction you will need to make is to ensure that the __END__ and __Pdlpp__ markers are flush against the left edge, i.e. there are no spaces before the underscores.

Basics

In this section I discuss the basics of writing PP code using pp_def. I will use Inline::Pdlpp for all of my examples, including this first one. If you need help getting Inline::Pdlpp to work, see Appendix A.

First Example

Let's begin with a variation on the canonical Hello World.

use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';
my $a = sequence(10);
$a->printout;

__END__

__Pdlpp__

pp_def('printout',
    Pars => 'a()',
    Code => q{
        printf("%f\n", $a());
    },
);

If you run that script, after a short pause you should see output that looks like this:

> perl my_script.pl
0.000000
1.000000
2.000000
3.000000
4.000000
5.000000
6.000000
7.000000
8.000000
9.000000

During that pause, Inlne took the text below the __Pdlpp__ marker and sent it off to Inline::Pdlpp, which generated a source file and a Makefile. Inline took it from there, compiling the function and then loading the newly compiled module into your current Perl interpreter. That module defined the function PDL::printout, which the script ran a couple of lines below the use Inline 'Pdlpp'. The cool part about Inline is that it caches the result of that build process and only rebuilds if you change the part below the __Pdlpp__ marker. You can freely play with the Perl part of the file and it will use the same cached Pdlpp code. Now that you understand what Inline did, let's take a closer look at how I actually defined the printout function.

PDL::PP is a Perl module that you use to generate the XS and Perl code for your PDL functions. This means that everything below the __Pdlpp__ marker is actually a plain Perl script, except that you don't need to use PDL::PP because Inline::Pdlpp took care of that for you.

In order to generate your XS code, you call one of the many functions defined in PDL::PP. All of these are discussed in the PDL::PP documentation, and in this chapter I will focus entirely on PDL::PP's workhorse: pp_def. In the above example, the code of interest is this:

pp_def('printout',
    Pars => 'a()',
    Code => q{
        printf("%f\n", $a());
    },
);

The first argument to pp_def is the name of the function you want to create. After that, you pass a number of key/value pairs to tell PDL::PP precisely what sort of function you are trying to create. The bare minimum for a normal computational function (as opposed to a slice function, for which there is sadly no documentation) is the Pars key and the Code key.

The Pars key specifies the piddle arguments for your function. It accepts a simple Perl string with the argument names and dimensions, delimited by semicolons. In the example I only use a single argument, but you can specify multiple input and output arguments, and you can even restrict (that is, force a coersion in) their data types. Note that the parentheses that follow the a are important and cannot be omitted. They might make the statement look like a function, but we'll see soon why they are important.

The Code key specifies a Perl string with a quasi-C block of code that I am going to call PP code. This Perl string gets thoroughly transformed by PDL::PP and combined with other keys to produce the XS (and eventually C) code for your function. You can think of PP code as being regular C code with a few special macros and notations. The first example already demonstrates one such notation: to access the value in a piddle, you must prefix the name with a dollar-sign and you must postfix it with parentheses. In the next section we'll see just what sort of arguments you can put in those parentheses.

Best Practice: Use q{ } for Code Sections

When creating a string for the Code key (as well as the BadCode, BackCode, and BadBackCode keys), I strongly recommend that you use Perl's q quote operator with curly braces as delimiters, as I have used in the examples so far. Perl offers many ways to quote long blocks of text. Your first impulse may be to simply use normal Perl quotes like so:

Code => ' printf("%f\n", $a()); ',

For longer lines, you would probably pull out the ever-useful heredoc:

Code => <<EOCode,

    printf("%f\n", $a());

EOCode

I have two reasons for recommending Perl's q operator. First, it makes your Code section look like a code block:

Code => q{
    printf("%f\n", $a());
}

Second, PDL::PP's error reporting is not the greatest, and if you miss a curly brace, Perl's interpreter will catch it as a problem. This is not the case with the other delimiters. In this example, I forgot to include a closing brace:

Code => <<'EOCode',
    printf("Starting\n");
    
    for(i = 0; i < $SIZE(n); ++i) {
        printf("%d: %f\n", i, $a(n => i));
    
    printf("All done\n");
EOCode

The C compiler will croak on the above example with an error that is likely to be obscure and only tangentially helpful. However, Perl will catch this typo at compile time if you use q{ }:

Code => q{
    printf("Starting\n");
    
    for(i = 0; i < $SIZE(n); ++i) {
        printf("%d: %f\n", i, $a(n => i));
    
    printf("All done\n");
},

Also note that I do not recommend using the qq quoting operator. Almost all the PDL::PP code strings delimit piddles using dollar-signs (like $a() above) and you must escape each one of these unless you want Perl to interpolate a varible for you. Obviously qq has its uses occationally, but in general I recommend sticking almost exclusively with q.

Let's now expand the example so that the function takes two arguments. Replace the original pp_def with this slightly more interesting code:

pp_def('printout_sum',
    Pars => 'a(); b()',
    Code => q{
        printf("%f + %f = %f\n", $a(), $b(), $a() + $b());
    },
);

Change the line that reads

$a->printout;

to the following two lines:

my $b = $a->random;
$a->printout_sum($b);

and you should get output that looks like this:

> perl two-args.pl
0.000000 + 0.690920 = 0.690920
1.000000 + 0.907612 = 1.907612
2.000000 + 0.479112 = 2.479112
3.000000 + 0.421556 = 3.421556
4.000000 + 0.431388 = 4.431388
5.000000 + 0.022563 = 5.022563
6.000000 + 0.014719 = 6.014719
7.000000 + 0.354457 = 7.354457
8.000000 + 0.705733 = 8.705733
9.000000 + 0.827809 = 9.827809

The differences between this and the previous example are not complicated but deserve some discussion. A cosmetic difference is that I have used a different name for the function, but a more substantial difference is that the function now takes two arguments, a() and b(), as specified by the Pars key. The Code block makes use of these two piddles, printing out the two and their sum. Notice that I access the value in a with the expression $a(), and the value in b with $b(). Also notice that I can use those values in an arithmetic expression.

Returning Values

The examples I have used have all demonstrated their behavior by printing out their results to STDOUT. If you are like me, you will be glad to know that you can use printfs throughout your PP code when it comes time to debug, but these functions would be far more useful if they returned piddles with the calculated results. Fortunately, PDL::PP functions are really just C functions in disguise, and ultimately the data are passed around in C arrays, essentially by reference. This means that you can modify incoming piddles in-place. For example, this function increments a piddle:

use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';
my $a = sequence(10);
print "a is initially $a\n";
$a->my_inc;
print "a is now $a\n";

__END__
__Pdlpp__
pp_def('my_inc',
    Pars => 'a()',
    Code => q{
        $a()++;
    },
);

When I run that, I get this output:

a is initially [0 1 2 3 4 5 6 7 8 9]
a is now [1 2 3 4 5 6 7 8 9 10]

If you want to modify a piddle in-place, PDL provides multiple mechanisms for handling this, depending on what you are trying to accomplish. In particular, there are ways to handle the inplace flag for a given piddle. But I'm getting a bit ahead of myself. Generally speaking, you shouldn't modify a piddle in-place: you should return a result instead. To do this, you simply mark the argument in the Pars key with the [o] qualifier. Here, I show how to return two arguments:

pp_def('my_sum_and_diff',
    Pars => 'left(); right(); [o] sum(); [o] diff()',
    Code => q{
        $sum() = $left() + $right();
        $diff() = $left() - $right();
    },
);

This function takes $left and $right as input arguments (in that order) and it outputs $sum and $diff (also in that order, as a Perl list). For example, we could run the above pp-code with Perl code like this:

use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';
my $left = sequence(10);
my $right = $left->random;

my ($sum, $diff) = $left->my_sum_and_diff($right);

print "Left:  $left\n";
print "Right: $right\n";
print "Sum:   $sum\n";
print "Diff:  $diff\n";

The functions defined using pp_def actually allow for you to pass in the output piddles as arguments, but I'll explore that in one of the exercises rather than boring you with more details.

Exercise Set 1

So far I have shown you how to write basic PP code that prints values to the screen or returns values. The great thing about PDL::PP is that this code actually allows for two different calling conventions, and it Does What You Mean when you give it all manner of piddles. Rather than bore you to death with more prose, I am going to give you a couple of exercises. Solutions to these exercises are in Appendix B.

1. Slices

Working with printout_sum, replace $b with a slice from some other piddle. Does it do what you expect?

2. Threading

With printout_sum, what if you replace $b with a two-dimensional piddle that is thread-compatible with $a? Try to guess the order of the output that you'll get before running the example. Did you guess correctly?

3. Orthogonal Piddles

What if $a has dimensions M and $b has dimensions (1, N) with printout_sum? What about my_sum_and_diff?

4. Varying Input Order

The PP code that I present puts all the output piddles at the end of the Pars section. What happens if you move them to the biggining of the section instead of the end?

5. Supplyling Outputs in the Function Call

You can call pp_defined functions by supplying all the arguments to the function. For example, instead of calling my_sum_and_diff like this:

# No output piddles in function call
my ($sum, $diff) = $left->my_sum_and_diff($right);

you can call it like this:

# All in function call, both outputs null
my ($sum, $diff) = (PDL::null, PDL::null);
$sum->my_sum_and_diff($left, $diff, $right);

What is the return value of this sort of invocation? How does the function call change if you alter the Pars order? There's a good reason for this capability, can you guess why PDL lets you do this?

Specifying Dimensions and Using Explicit Looping

Exercises 1.2 and 1.3 demonstrate that PDL::PP automatically loops over the values in a piddle for you. What if you want to do some sort of aggregate behavior, such as computing the sum of all the values in a piddle? This requires more fine-grained control of the code over which PDL::PP loops.

Our discussion begins by looking more closely at the Pars key. When you have a parameter list like 'input(); [o] output()', you are telling PDL::PP that you want it to present the data from the input and output piddles as scalars. The code you specify in the Code key gets wrapped by a couple of C for loops that loop through higher dimensions, something that we call threading. There are many calculations you cannot do with this simplistic representation of the data, such as write a Fourier Transform, matrix-matrix multiplication, or a cumulative sum. For these, you need PDL::PP to represent your data as vectors or matrices.

Note: I am about to cover some material that makes sense once you get it, but which is very easy to mis-interpret. Pay close attention!

To tell PDL::PP that you want it to represent the data as a vector, you specify a dimension name in the Pars key, such as

Pars => 'input(n); [o] sum()'

Notice that I have put something within the parentheses of the input piddle, n. That means that I want PDL::PP to represent the input as a vector with one dimension and I am going to refer to its (single) dimension by the name n. Then, to access the third element of that vector, you would write $input(n => 2). (Element access uses zero-offsets, just like Perl and C array access.) To sum all the values in the vector and store the result in the output variable, you could use a C for-loop like so:

int i;
$sum() = 0;
for (i = 0; i < $SIZE(n); i++) {
    $sum() += $input(n => i);
}

Here, $SIZE(n) is a PDL::PP macro that returns the length of the vector (or more precisely, the size of the dimension that we have called n).

Best practice: optimize for clarity when using $SIZE

When I first encountered the $SIZE PDL::PP macro, I assumed it produced slow code. It turns out that it replaces itself with a direct variable access, which is quite fast. As a general rule regarding $SIZE, optimize for clarity. The only exception is that, as of this writing, you cannot use $SIZE within a direct memory access, as I discuss next.

Wart: no parenthisized expressions within direct memory access

Due to a current limitation in PDL::PP, you cannot use parenthized expressions within a memory access. For example, this will fail to compile and will throw a most obscure error:

$sum() += $input(n => (i-1));

The reason is that the parser isn't a real parser: it's just a series of regexes. It takes everything up until the first closing parenthesis and doesn't realize that you put i-1 in parentheses. This means that these also fail:

$sum() += $input(n => calculate_offset(i));
$sum() += $input(n => $SIZE(n)-1);

You can use expressions that do not involve parentheses, even expressions involving arithmetic, so you can achieve the same ends with these work-arounds:

long calc_off = calculate_offset(i);
$sum() += $input(n => calc_off);

long N = $SIZE(n);
$sum() += $input(n => N-1);

I intend to improve this soon so that at least parenthized expressions will work in memory access statements. However, fixing access statement parsing to allow $SIZE(n) may require a more substanial overhaul of the parser and may not happen any time soon. Sorry.

PDL::PP also provides a convenient short-hand for this sort of loop:

$sum() = 0;
loop (n) %{
    $sum() += $input();
%}

Here, I declare a PDL::PP loop block. Standard blocks in C (and in Perl) are delimited with curly braces, but the loop block is delimited with %{ and %}. You end up with code that is functionally identical to the previous method for writing this sum, but you can use fewer keystrokes to do it.

Putting this all together, here is a complete example that performs a sum over a vector:

use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';
my $a = sequence(10);
print "a is $a and its sumover is "
    , $a->my_sumover, "\n";

my $b = sequence(3, 5);
print "b is $b and its sumover is "
    , $b->my_sumover, "\n";

__END__

__Pdlpp__

pp_def('my_sumover',
    Pars => 'input(n); [o] sum()',
    Code => q{
        $sum() = 0;
        loop (n) %{
            $sum() += $input();
        %}
    }
);

That gives the following output:

a is [0 1 2 3 4 5 6 7 8 9] and its sumover is 45
b is 
[
 [ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]
]
 and its sumover is [3 12 21 30 39]

As the calculation on $a shows, when you perform the calculation on a one-dimensional piddle, it returns a single result with the sum of all the elements. The calculation on $b treats each row as a vector and performs the calculation on each row.

Let's look at another example, matrix-matrix multiplication. (You remember how to do matrix-matrix multiplication, right? No? Brush-up on Wikipedia.) How would we write such an algorithm using PDL::PP? First, the Pars section needs to indicate what sort of input and output piddles we want to handle. The length of the row of the first matrix has to be equal to the length of the column of the second matrix. The output matrix will have as many rows as the second matrix, and as many columns as the first matrix. Second, we need to loop over the entire output dimensions. Altogether, my first guess at this function looked like this:

pp_def('my_matrix_mult',
    Pars => 'left(n,m); right(m,p); [o] output(n,p)',
    Code => q{
        loop (n) %{
            loop (p) %{
                loop (m) %{
                    $output() = $left() * $right();
                %}
            %}
        %}
    },
);

"Wait," you say, "That's it? It's that simple?" Yep. Once you figure out the relationship of the dimension sizes, the threading engine just Does What You Mean. (As you'll see, I got the dimensions wrong, but it'll be a quick fix.) You can run that with this Perl code:

use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';
my $left = sequence(2,4);
my $right = sequence(4, 5);
print "$left times $right is ", $left->my_matrix_mult($right);

and that gives this output:

[
 [0 1]
 [2 3]
 [4 5]
 [6 7]
]
 times 
[
 [ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
]
 is 
[
 [ 18  21]
 [ 42  49]
 [ 66  77]
 [ 90 105]
 [114 133]
]

Oops! You can see that PDL considers the first argument to the number of columns, not the number of rows! I'll let you fix that in an exercise.

PDL::PP also has the threadloop construct, which lets you declare the code over which PDL should thread, and the code that should come before and after the thread loop. Here's a simple example demonstrating the threadloop construct in conjunction with the loop construct:

use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';

# Run the code on a 2x4 matrix:
sequence(2,4)->my_print_rows;

# Run the code on a 3x4x5 matrix:
sequence(3,4,5)->my_print_rows;

__END__

__Pdlpp__

pp_def('my_print_rows',
    Pars => 'in(n)',
    Code => q{
        printf("About to start printing rows\n");
        int row_counter = 0;
        threadloop %{
            printf("  Row %3d: ", row_counter);
            loop(n) %{
                printf("%f, ", $in());
            %}
            printf("\n");
            row_counter++;
        %}
        printf("All done!\n");
    },
);

A snippet of that output looks like this:

About to start printing rows
  Row   0: 0.000000, 1.000000, 
  Row   1: 2.000000, 3.000000, 
  Row   2: 4.000000, 5.000000, 
  Row   3: 6.000000, 7.000000, 
All done!
About to start printing rows
  Row   0: 0.000000, 1.000000, 2.000000, 
  Row   1: 3.000000, 4.000000, 5.000000, 
  ...
  Row  19: 57.000000, 58.000000, 59.000000, 
All done!

This is particularly useful if you are writing a function that needs access to a system resource that is costly to allocate with each iteration. For that sort of operation, you allocate it before entering the threadloop and de-allocate it after leaving:

Code => q{
    /* allocate system resource */
    threadloop %{
        /* use system resource */
    %}
    /* Free system resource */
},

To put this all together, I am going to consider writing a PDL::PP function that computes the first numerical derivative of a time series. You can read about finite difference formulas here: http://en.wikipedia.org/wiki/Numerical_differentiation. Normally, finite difference formulas result in a numerical derivative with one less point than the original time series. Since I have not discussed how to set a return dimension with a calculated size, I'm going to use a slightly modified numerical derivative. The derivatives associated with the first and last points will be calculated using the right and left finite differences, respectively, whereas the points in the middle will be calculated using a centered-difference formula. I'll run this function on the sine wave and compare the results with the actual derivative of the sine wave, which is the cosine wave. I've marked a couple of points in the code for the discussion that follows.

use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';

# Create some sine data:
my $h = 0.3;
my $sine = sin(sequence(10) * $h);
my $derivative = $sine->my_first_derivative($h);
my $cosine = cos(sequence(10) * $h);

print "The difference between the computed and actual derivative:\n"
   , $derivative - $cosine, "\n";

__END__

__Pdlpp__

pp_def('my_first_derivative',
    Pars => 't_series(n); step(); [o] derivative(n)',
    Code => q{
        int N = $SIZE(n);
        threadloop %{
            /* Derivative for i = 0 */
            $derivative(n => 0)
                = ($t_series(n => 1) - $t_series(n => 0))
                    / $step();
            /* Derivatives for 1 <= i <= N-2 */
            /* (Point 1) */
            loop (n) %{
                /* Skip the first and last elements (Point 2) */
                if (n == 0 || n == N - 1) {
                    /* (Point 3) */
                    continue;
                    
                }
                /* (Points 4 and 5) */
                $derivative()
                    = ($t_series(n => n+1) - $t_series(n => n-1))
                        / 2.0 / $step();
            %}
            /* Derivative for i = N-1 */
            $derivative(n => N-1)
                = ($t_series(n => N-1) - $t_series(n => N-2))
                    / $step();
        %}
    },
);

The output on my machine looks like this:

The difference between the computed and actual derivative: [-0.014932644 -0.0142657 -0.012324443 -0.0092822807 -0.0054109595 -0.0010562935 0.0033927281 0.0075386874 0.011011238 0.077127808]

These differences are fairly small, four times smaller than the (fairly large) step size. And if I decrease the size of $h by 2, these errors should get smaller by a factor of 4 except at the endpoints. Not bad.

But what we really care about is the code, which uses a number of tricks I haven't discussed yet. Let's run through each point in turn.

point 1, a sub-optimal example

The code within this loop does not actually compute results for all indices from zero to N-1. As such, I should use a for loop that starts from 1 and runs to N-2. I dislike it when bad examples are used for pedagogical reasons, but that's what I'm going to do here. Sorry.

point 2, a useful register

The actual C code that gets generated by the loop construct creates a register variable called n within the scope of the loop block. Thus, we can access the current value of n from within the loop by simply using that value in our code. I do that in this if statement and in the memory accesses later.

point 3, C looping commands

The loop construct creates a bona-fide C for loop, so you can use break and continue, just like in a real C for loop.

point 4, explicit dimension values within a loop block

When we loop over n, it saves you keystrokes in your memory access by making it unnecessary to specify n. This is exploited when I say $derivative() without specifying a value for n. However, we can override that value for n within the loop by explicitly specifying it, which is what I do with $t_series(n = n-2)>.

point 5: which n?

Look closely at the access statements for $t_series:

$t_series(n => n-1)

PDL::PP parses this as

$ <pars-variable-name> ( <dimension-name> => <value>,
                         <dimension-name> => <value>,
                         ...
                       )
and replaces it with a direct array access statement. In this statement,
the C<n> on the left side of the fat comma (the C<< => >>) is the name of
the dimension. The C<n> on the right side of the fat comma is part of a C
expression and is not touched by PDL::PP. That means that the C<n> on the
right side refers to the C variable C<n>. This makes two uses of the same
token, C<n>, which can be a bit confusing. I'm not suggesting that this is
a best practice, but it is a possible practice which may be useful to you.
So now you know.

In the above section I have explained how to use loop and threadloop to control how PDL::PP presents data to your code, and to control which sections of code PDL::PP threads over. I have also shown you how to access specific memory locations when you have vector representations of your data.

Exercise Set 2

1. Matrix Multiplication, Fixed

I noted above that my code for the matrix multiplication is incorrect and I explained why. Changing nothing more than the Pars section, fix this code so that it performs proper matrix multiplication.

2. Threading Engine Tricks

The function my_sumover uses a loop construct, so it only operates on individual rows. What if you wanted to perform the sum an entire matrix? Using Perl level operations, find a way to manipulate the incoming piddle so that you can call my_sumover to get the sum over the entire matrix. Bonux points if the same technique works for higher dimensional piddles.

3. Cumulative Sum

Modify my_sumover to create a function, my_cumulative_sum, which returns the cumulative sum for each row. By this I mean that it would take the input such as (1, 2, 3, 4) and return (1, 3, 6, 10), so that each element of the output corresponds to the sum of all the row's elements up to that point.

4. Full Cumulative Sum

Take your code for my_cumulative_sum and modify it so that it returns the cumulative sum over the entire piddle, regardless of the piddle's dimension. Your resulting code should not have any loop constructs.

Tips

These are a couple of things I have learned which help me make effective use of PDL::PP, but which did not sensibly fit elsewhere.

Best Practice: use pp_line_numbers

PDL::PP includes a brand new function in PDL 2.4.10 called pp_line_numbers. This function takes two arguments: a number and a string. The number should indicate the actual line in your Perl source file at which the string starts, and the function causes #line directives to be inserted into the string. This is ENORMOUSLY helpful when you have a syntax error. Without it, the syntax error is reported as coming from a given line in your XS file, but with it the error is reported as coming from your own source file.

I will illustrate this with an example that gave me great trouble while I was preparing this text:

use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';

# Run the code on a 2x4 matrix:
sequence(2,4)->my_print_rows;

__END__

__Pdlpp__

pp_def('my_print_rows',
    Pars => 'in(n)',
    Code => q{
        printf("About to start printing rows\n");
        int row_counter = 0;
        threadloop %{
            printf("  Row %3d: ", row_counter);
            loop(n) %{
                printf("%f, ", $in())
            %}
            printf("\n");
            row_counter++;
        %}
        printf("All done!\n");
    },
);

Notice what's missing? The semicolon at the end of the printf is missing. Unfortunately, the error output of this example (contained in _Inline/build/bad_error_reporting_pl_8328/out.make) borders on useless:

bad_error_reporting_pl_4420.xs: In function ‘pdl_my_print_rows_readdata’:
bad_error_reporting_pl_4420.xs:177: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘int’
bad_error_reporting_pl_4420.xs:177: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘int’
bad_error_reporting_pl_4420.xs:178: error: expected ‘;’ before ‘}’ token
bad_error_reporting_pl_4420.xs:222: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘int’
bad_error_reporting_pl_4420.xs:222: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘int’
bad_error_reporting_pl_4420.xs:223: error: expected ‘;’ before ‘}’ token
bad_error_reporting_pl_4420.xs:267: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘int’
bad_error_reporting_pl_4420.xs:267: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘int’
bad_error_reporting_pl_4420.xs:268: error: expected ‘;’ before ‘}’ token
bad_error_reporting_pl_4420.xs:312: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘PDL_Long’
bad_error_reporting_pl_4420.xs:312: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘PDL_Long’
bad_error_reporting_pl_4420.xs:313: error: expected ‘;’ before ‘}’ token
bad_error_reporting_pl_4420.xs:357: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘PDL_LongLong’
bad_error_reporting_pl_4420.xs:357: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘PDL_LongLong’
bad_error_reporting_pl_4420.xs:358: error: expected ‘;’ before ‘}’ token
bad_error_reporting_pl_4420.xs:403: error: expected ‘;’ before ‘}’ token
bad_error_reporting_pl_4420.xs:448: error: expected ‘;’ before ‘}’ token

If you're a seasoned C programmer, you'll recognize the warning: it arises because PDL::PP creates a branches of code for each data type that PDL supports, so using the %f type is not correct. (The correct way to handle this is to use the $T macro.) That's not our problem, though. The issue is the expected semicolon error. For a small function, you can probably just scan through the code and look for a missing semicolon, but when you are working on a much larger set of PP code, having the line number of the error would be much more useful. You accomplish that by using the pp_line_numbers function, which adds #line directives into your code so that errors get reported on the correct lines. Here is a slightly doctored version to illustrate the issue:

use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';

# Run the code on a 2x4 matrix:
sequence(2,4)->my_print_rows;

__END__

__Pdlpp__
#line 1 "my-inline-work"
                  # This is reported as line 1
pp_def('my_print_rows',
    Pars => 'in(n)',
    Code => pp_line_numbers(__LINE__, q{
        /* This line is reported as line 5
         * Thanks to pp_line_numbers */
        printf("About to start printing rows\n");
        int row_counter = 0;
        threadloop %{
            printf("  Row %3d: ", row_counter);
            loop(n) %{
                printf("%f, ", $in())
            %}
            printf("\n");
            row_counter++;
        %}
        printf("All done!\n");
        /* This is line 18 */
    }),
);     # This is reported as line 20

Apart from a couple of comments to indicate the line counting, I introduced two modifications: I added a #line directive at the top of the Pdlpp section and I wrapped the Code section in a call to pp_line_numbers. (The #line directive is only necessary when using Inline::Pdlpp, and is not necessary in a .pd file.) Now the error output gives the line of the closing bracket that reports the missing semicolon:

my-inline-work: In function ‘pdl_my_print_rows_readdata’:
my-inline-work:12: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘int’
my-inline-work:12: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘int’
my-inline-work:13: error: expected ‘;’ before ‘}’ token
my-inline-work:12: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘int’
my-inline-work:12: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘int’
my-inline-work:13: error: expected ‘;’ before ‘}’ token
my-inline-work:12: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘int’
my-inline-work:12: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘int’
my-inline-work:13: error: expected ‘;’ before ‘}’ token
my-inline-work:12: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘PDL_Long’
my-inline-work:12: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘PDL_Long’
my-inline-work:13: error: expected ‘;’ before ‘}’ token
my-inline-work:12: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘PDL_LongLong’
my-inline-work:12: warning: format ‘%f’ expects type ‘double’, but argument 2 has type ‘PDL_LongLong’
my-inline-work:13: error: expected ‘;’ before ‘}’ token
my-inline-work:13: error: expected ‘;’ before ‘}’ token
my-inline-work:13: error: expected ‘;’ before ‘}’ token

All the errors are reported as occurring on line 13, immediately directing your eye to where the problem lies. This lets you fix your problem and get on with your work.

Sometimes PDL::PP's parser croaks on invalid input. Sometimes it doesn't. For those times when you when you feed PDL::PP bad code and the error reporting leaves you scratching your head, consider wrapping your code in a pp_line_numbers call.

Wart: /* */ doesn't always work; use #if 0

For better or for worse, some of PDL::PP's replacements include comments indicating what they do. This is useful when you find yourself digging into the generated XS code as it helps you get your bearings. If you are like me, when there is a bug in your code, it is often helpful to comment-out a section of code to see if it caused the problem:

use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';

# Run the code on a 2x4 matrix:
sequence(2,4)->my_printout;

__END__

__Pdlpp__
#line 1 "my-printout-pdlpp"
pp_def('my_printout',
    Pars => 'in()',
    Code => pp_line_numbers(__LINE__, q{
        printf("This piddle contains:\n");
        threadloop %{
            /* grr, not working
            printf("  %f\r", $in());
            */
            printf("  Here\n");
        %}
    }),
);

The problem, by the way, is that I used a \r instead of a \n in the printf statement, and if I try printing a sufficiently small piddle on a sufficiently fast machine, I will not see any values printed to the screen before they are covered up by whatever text comes next in my code. So I try blocking out the printf statement and replacing it with a dummy statement that I believe will work withou a hitch. Unfortunately, this gives me these errors:

my-printout-pdlpp: In function ‘pdl_my_printout_readdata’:
my-printout-pdlpp:7: error: expected statement before ‘)’ token
my-printout-pdlpp:8: error: expected expression before ‘/’ token
my-printout-pdlpp:7: error: expected statement before ‘)’ token
my-printout-pdlpp:8: error: expected expression before ‘/’ token
my-printout-pdlpp:7: error: expected statement before ‘)’ token
my-printout-pdlpp:8: error: expected expression before ‘/’ token
my-printout-pdlpp:7: error: expected statement before ‘)’ token
my-printout-pdlpp:8: error: expected expression before ‘/’ token
my-printout-pdlpp:7: error: expected statement before ‘)’ token
my-printout-pdlpp:8: error: expected expression before ‘/’ token
my-printout-pdlpp:7: error: expected statement before ‘)’ token
my-printout-pdlpp:8: error: expected expression before ‘/’ token
my-printout-pdlpp:7: error: expected statement before ‘)’ token
my-printout-pdlpp:8: error: expected expression before ‘/’ token

Lines seven and eight are these:

printf("  %f\r", $in());
*/

Perplexed? You bet. I just commented out some code, how could I possibly have introduced a compile error? Using pp_line_numbers, I understand where the error came from, but I'm even more confused as to why it arose in the first place.

The fix to this problem is to use #if 0, a common technique among C programmers for cutting out blocks of code:

use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';

# Run the code on a 2x4 matrix:
sequence(2,4)->my_printout;

__END__

__Pdlpp__
#line 1 "my-printout-pdlpp"
pp_def('my_printout',
    Pars => 'in()',
    Code => pp_line_numbers(__LINE__, q{
        printf("This piddle contains:\n");
        threadloop %{
          #if 0
            printf("  %f\r", $in());
          #endif
            printf("  Here\n");
        %}
    }),
);

PDL::PP will still merrily fiddle with the stuff between the #if 0 and #endif, but the C preprocessor will get rid of it before it actually tries to compile the code. Now the code at least runs and printouts the exptected dumb results:

This piddle contains:
  Here
  Here
  Here
  Here
  Here
  Here
  Here
  Here

Hopefully this gives me enough to find that errant \r.

Recap

In this chapter, I've covered the very basics of using PDL::PP to write fast, versatile code. I have covered much less material than I had hoped, and I hope to expand this chapter in the coming months. Nonetheless, I hope and believe it will serve as a good starting point for learning PDL::PP, and I expect it will give you enough to dig into the PDL::PP documentation.

Good luck, and happy piddling!

Appendix A: Installing Inline::Pdlpp

The PDL installation always installs Inline::Pdlpp, but that does not mean it works for you because Inline is not actually a prerequisite for PDL. The good news is that once you have installed Inline, Inline::Pdlpp will work automatically.

To begin, you will need to have access to the C compiler that compiled your copy of Perl. On Mac and Linux, this amounts to ensuring that the developer tools that contain gcc are installed on your system. On Windows, this will depend on your flavor of Perl. I personally have excellent experience working with Strawberry Perl, which ships with a working C compiler, but you can also work with Visual C or Cygwin. If you run into trouble, contact the PDL mailing list for help.

If you are on Linux, you can probably install Inline using your package manager. If you are not on Linux or you do not have administrative privileges, you will have to install Inline using CPAN. To do this, enter the following commands at your console:

> cpan Inline

This will likely ask you a few questions during the installation, so do not walk away to get a cup of coffee and expect it to be done.

Once that's installed, you should be ready to work with the examples.

Appendix B: Solutions to Exercises

Excercise Set 1

1. Slices
use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';
use PDL::NiceSlice;

# Create $a
my $a = sequence(5);
print "a is $a\n";

# Create $b as a five-element slice from a sequence:
my $idx = pdl(1, 2, 7, 4, 8);
my $b = sequence(20)->index($idx);
print "b is $b\n";

print "printout_sum(a, b) says:\n";
$a->printout_sum($b);

no PDL::NiceSlice;

__END__

__Pdlpp__
pp_def('printout_sum',
    Pars => 'a(); b()',
    Code => q{
        printf("%f + %f = %f\n", $a(), $b(), $a() + $b());
    },
);
2. Threading
use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';

my $a = sequence(5);
print "a is $a\n";
my $b = sequence(5,3);
print "b is $b\n";

print "a + b = ", $a + $b, "\n";

print "printout_sum(a, b) says:\n";
$a->printout_sum($b);

__END__

__Pdlpp__
pp_def('printout_sum',
    Pars => 'a(); b()',
    Code => q{
        printf("%f + %f = %f\n", $a(), $b(), $a() + $b());
    },
);
3. Orthogonal Piddles
use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';

my $a = sequence(5);
print "a is $a\n";
my $b = sequence(1,3);
print "b is $b\n";

print "a + b = ", $a + $b, "\n";

print "printout_sum(a, b) says:\n";
$a->printout_sum($b);

__END__

__Pdlpp__
pp_def('printout_sum',
    Pars => 'a(); b()',
    Code => q{
        printf("%f + %f = %f\n", $a(), $b(), $a() + $b());
    },
);
4. Varying Input Order

Different input order would be like this:

Pars => '[o] sum(); left(); [o] diff(); right()';
Pars => '[o] sum(); [o] diff(); left(); right()';

The only consistency here is that sum always comes before diff, and left always comes before right.

5. Supplyling Outputs in the Function Call

For a Pars key like this:

Pars => 'left(); right(); [o] sum(); [o] diff()';

You can call the function like this:

my ($sum, $diff) = $left->my_sum_and_diff($right);

my ($sum, $diff);
$left->my_sum_and_diff($right
    , ($sum = PDL::null), ($diff = PDL::null));

my $sum = $left->zeroes;
my $diff = PDL::null;
$left->my_sum_and_diff($right, $sum, $diff);

For the latter calling convention, the function returns nothing (rather than $sum and $diff). When you supply a null piddle (as in the middle example) or you call the function with the input piddles only (as in the first example), PDL will allocate memory for you. As demonstrated with the last example, you can supply a pre-allocated piddle, in which case PDL will not allocate memory for you. This can be a performance issue when you regularly call functions

Exercise Set 2

1. Matrix Multiplication, Fixed

The corrected Pars section should look like this:

Pars => 'left(m,n); right(p,m); [o] output(n,p)',
2. Threading Engine Tricks

The key is to use clump(-1):

my $matrix = sequence(2,4);
my $result = $matrix->clump(-1)->my_sumover;
3. Cumulative Sum
use strict;
use warnings;
use PDL;
use Inline 'Pdlpp';
my $a = sequence(10);
print "Cumulative sum for a:\n";
print $a->my_cumulative_sum;
my $b = grandom(10,3);
print "\nCumulative sum for b:\n";
print $b->my_cumulative_sum;

__END__

__Pdlpp__

pp_def('my_cumulative_sum',
    Pars => 'input(n); [o] output(n)',
    Code => q{
        double cumulative_sum;
        threadloop %{
            cumulative_sum = 0.0;
            loop (n) %{
                cumulative_sum += $input();
                $output() = cumulative_sum;
            %}
        %}
    }
);
4. Full Cumulative Sum
pp_def('my_full_cumulative_sum',
    Pars => 'input(); [o] output()',
    Code => q{
        double cumulative_sum = 0.0;
        threadloop %{
            cumulative_sum += $input();
            $output() = cumulative_sum;
        %}
    }
);

AUTHOR, COPYRIGHT

David Mertens <[email protected]>

LICENSE AND COPYRIGHT

Copyright (c) 2011 David Mertens. All rights reserved.

This is free documentation; you can redistribute it and/or modify it under the same terms as Perl itself.

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 947:

Non-ASCII character seen before =encoding in '‘pdl_my_print_rows_readdata’:'. Assuming UTF-8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment