How to Bisect a Problem in the Perl 5 Debugger

Bisection and the Perl 5 Core Distribution

Bisection is a developer's technique used to locate the point in a series of commits to a repository where the behavior of an application changed. While this change could be a change for the better -- "When did this feature start accidentally working?" -- the change being sought is usually a change for the worse: "When did this feature stop working correctly?".

Source code stored in a git repository can generally be bisected with some variant of the git bisect command. In the Perl 5 repository we have a program, Porting/bisect.pl, which is essentially a giant wrapper around git bisect intended to handle bisection situations specific to that codebase.

For reasons I do not understand, the documentation for Porting/bisect.pl is actually found in a different file: Porting/bisect-runner.pl. The synopsis for that documentation (as of perl-5.27.3) contains a number of typical use cases:

$> perldoc Porting/bisect-runner.pl

...

SYNOPSIS
    # When did this become an error?
    .../Porting/bisect.pl -e 'my $a := 2;'
    # When did this stop being an error?
    .../Porting/bisect.pl --expect-fail -e '1 // 2'
    # When did this stop matching?
    .../Porting/bisect.pl --match '\b(?:PL_)hash_seed_set\b'
    # When did this start matching?
    .../Porting/bisect.pl --expect-fail --match '\buseithreads\b'
    # When did this test program stop working?
    .../Porting/bisect.pl -- ./perl -Ilib ../test_prog.pl
    # When did this first become valid syntax?
    .../Porting/bisect.pl --target=miniperl --end=v5.10.0 \
         --expect-fail -e 'my $a := 2;'
    # What was the last revision to build with these options?
    .../Porting/bisect.pl --test-build -Dd_dosuid

Bisection is not something anyone does on an every day basis. Consequently, every time I have to bisect some code I have to stop to think, "Which of these cases covers my current condition?" I eventually figured out that it would be good to write down the answers to that question! So what follows is one instance of problem solving via bisection in the Perl 5 core distribution. YMMV!

The Problem at Hand: Regression in the Perl 5 Debugger

Is there anyone who loves the Perl 5 debugger? Let's have a show of hands.

Ah, I didn't think so. It looks ugly on the screen. It's less user-friendly as a learning tool than REPLs in, say, Python or Scala. It's something which its own documentation warns you not to use unless you have already run your code with warnings enabled. Its learning curve is much more difficult than simply scattering print statements through your code. And since you only use when you've got a sticky problem, you don't use it often enough to master all its functionality.

As a consequence, the debugger's own code doesn't get much love, either. The debugger is nothing but a Perl program found within the Perl 5 core distribution -- lib/perl5db.pl, to be precise. So bugs in the debugger are reported to the main Perl 5 bug tracker, RT, where they tend to linger un-worked-on for years.

Such was the case with RT #120174, entitled "Debugger command regression: Now requires more spaces". This ticket was created by a long-time British (I think) Perl hacker named Smylers on the anniversary of Thelonious Monk's birth in October 2013. Smyler reported that for a long time he had been accustomed to using the p and x debugger commands -- used to print and execute statements, respectively -- without a subsequent wordspace, like this:

p@ARGV

x@ARGV

x\@ARGV

x\%INC

This syntax is one character shorter than the documented syntax for these commands:

p @ARGV

x @ARGV

x \@ARGV

x \%INC

Smylers noted that the wordspace-less syntax worked in perl-5.14.3 but was clearly not working in perl-5.19.5. By "not working", Smyler meant that when he tried the wordspace-less syntax, he got exceptions like this:

   DB<72> x\@ARGV
 Backslash found where operator expected at (eval 8)[lib/perl5db.pl:732] line 2, near "x\"
 at (eval 8)[lib/perl5db.pl:732] line 2.
   eval 'no strict; ($@, $!, $^E, $,, $/, $\\, $^W) = @DB::saved;package main; $^D = $^D | $DB::db_stop;
 x\\@ARGV;
 ' called at lib/perl5db.pl line 732
   DB::eval called at lib/perl5db.pl line 3091
   DB::DB called at -e line 1
syntax error at (eval 8)[lib/perl5db.pl:732] line 2, near "x\"

Smylers then added, "I haven't tried to narrow it down any further, mainly cos I don't know how to come up with a non-interactive way of demonstrating the change."

Neither, apparently, did anyone else. Nor was anyone else moved to reply to this bug report for nearly four years. However, having too much time on my hands this summer, I began to once again work my way through the RT "New" queue, that is, RT tickets which had not received a single reply. On September 3, 2017, I confirmed Smyler's original report. It then took me several stages to, in Smyler's words, "... come up with a non-interactive way of demonstrating the change."

Stage 1: Perlbrew

In hacking on the Perl 5 core distribution the command-line tool perlbrew is your friend. You can get it from CPAN at App-perlbrew. It was created by a Perl hacker named Kang-min Liu, whose nationality or terrestrial location, to be honest, I do not know. With perlbrew you can have multiple versions of perl installed beneath your home directory. So if you want to find out how something worked in, say, perl-5.14 back in 2011-12, you simply say:

perlbrew use perl-5.14.4

And there it is!

$ perl -v | head -2 | tail -1
This is perl 5, version 14, subversion 4 (v5.14.4) built for x86_64-linux

So the first thing I did in studying Smyler's problem was to use perlbrew to quickly move through Perl versions, firing up the debugger in each, until I found:

The most recent production version in which the wordspace-less syntax worked; and
The oldest production version in which the wordspace-less syntax no longer worked.

I was quickly able to determine that the wordspace-less syntax was working in perl-5.16.3 but was not working in perl-5.18.4. This meant that the change of behavior probably took place between 2012 and 2013, somewhere between this commit:

commit 68cd360812f9eaa2d34c45c501e2fef87c44ccde
CommitDate: Tue May 1 19:08:57 2012 -0400

... and this commit:

commit 1799399c45baa7e5af45db33567d2a2de629d5e0
CommitDate: Fri Jun 7 11:51:45 2013 -0400

In this period of slightly over a year, there were more than 200 commits to lib/perl5db.pl -- but thousands of commits to the entire core distribution. This was clearly going to need bisection.

Stage 2: Which Syntax Should I Use for `Porting/bisect.pl`?

But which of the various invocations for Porting/bisect.pl shown above was I to use? Well, if any other Perl program terminated with syntax errors like Backslash found where operator expected, I would know that the program, in Unix terms, exited with a non-zero error number. Any program that ended with EXIT_FAILURE would be suitable for this invocation:

    .../Porting/bisect.pl -- ./perl -Ilib ../test_prog.pl

... as this version is designed to report the first commit at which ../test_prog.pl failed, i.e., exited with a non-zero error number.

The Perl 5 debugger, however, is not just any other Perl program. When you are debugging code and encounter an exception, the debugger traps that exception but it itself does not die. It waits for you to tell it what to do next. If, for example, you tell the debugger to quit (q), the debugger program will exit successfully, i.e., it will exit 0.

Hence, we won't go anywhere with an invocation in which the debugger is itself the final argument:

    .../Porting/bisect.pl -- ./perl -Ilib lib/perl5db.pl  # <-- WON'T WORK

So what we have to do is to write a program which:

simulates the action of the debugger's p and x commands;
throws exceptions when the simulated p and x commands are invoked without a subsequent wordspace; and
exits with a non-zero error number that can be detected by Porting/bisect.pl and cue the termination of the bisection process.

Stage 3: Write a Program that Fails

What we do a program that exercises parts of an application's functionality?

We call it a test.

So, to exercise the debugger's p or x commands we can seek guidance by looking at the way the Perl 5 debugger is itself tested within the core distribution. The debugger is tested via lib/perl5db.t. To get a basic understanding of how that program works, let's suppose that we have a little, 4-line Perl program whose code we want to step through with the debugger:

$> cat rt-120174

@abc = (1..4);
print "hello world\n";
%xyz = ( 'alpha' => 'beta', 'gamma' => 'delta' );
print "goodbye world\n";

We place this program in lib/perl5db/t. We then add a block of code to lib/perl5db.t which sucks in this 4-line program and iterates through an array of debugger commands until we get success or failure.

{
    # perl 5 RT #120174 - 'p' command
    my $wrapper = DebugWrap->new(
        {
            cmds =>
            [
                'b 2',
                'c',
                'p@abc',
                'q',
            ],
            prog => '../lib/perl5db/t/rt-120174',
        }
    );

    ...
}

lib/perl5db.t contains a DebugWrap whose constructor takes a hash reference with cmds and prog elements. The cmds element holds a list of debugger commands. The prog element points to the chunk of code we're going to step through. In the example above, we:

Set a breakpoint for line 2
Instruct the program to continue until the breakpoint is reached
At the breakpoint print out @abc

Based on the 4-line program -- and assuming the wordspace-less syntax is working -- we expect that output to be:
```
  1234
```
That is, the same as the output of this command in a regular Perl program:
```
  print "@abc\n";
```
Quit the debugger

The DebugWrap class provides a contents_like() method which asks if the output of a debugger command matches a pattern specified as the method's first argument.

{
    ...

    $wrapper->contents_like(
        qr/1234/,
        q/RT 120174: p command can be invoked without space after 'p'/,
    );
}

If you were to create branches from an "old" version of Perl and from Perl 5 blead, respectively, then add the tests described above to each branch, you would see that:

In the "old" version p@abc works as Smylers was accustomed to having it work, that is, it would print 1234 and the contents_like unit test would PASS -- permitting lib/perl5db.t and make test to pass as well.
In the blead-based version p@abc would throw exceptions and the contents_like unit test would FAIL -- in turn causing lib/perl5db.t and make test to fail.

Stage 4: Write a Program that Fails that We Can Use during Bisection

If the unit test block and sample code above had been added to the Perl 5 core distribution ten years ago, then we would have known when the behavior changed. But it wasn't, which means we have to hunt for the breaking commit. We have to place our testing code and our 4-line sample code somewhere outside the core distribution on our filesystem.

That's conceptually simple, but difficult in practice. Why? Because lib/perl5db.t requires code which is found only within the core distribution, specifically, t/test.pl.

BEGIN {
    chdir 't' if -d 't';
    @INC = '../lib';
    require './test.pl';
}

The DebugWrap package uses subroutines imported from t/test.pl. We have to jump through some hoops to be able to use the DebugWrap package to work properly. I had to write a program which I called adapted-perl5db.t which I placed in a scratch directory. This program starts by chdir-ing to my git-checkout directory, and then requires t/test.pl from there.

BEGIN {
    chdir $ENV{PERL_WORKDIR};
    print STDERR "Working from: $ENV{PERL_WORKDIR}\n";
    chdir 't' if -d 't';
    @INC = '../lib';
    require './test.pl';
}

adapted-perl5db.t then copies much of the code from lib/perl5db.t but throws out all the unit test blocks except the ones we want for testing our current bug.

my $plancount = 4;
plan($plancount);

In package main within adapted-perl5db.t, I define the following variables and subroutines:

my $passcount = 0;

my $myprog = '/home/jkeenan/learn/perl/p5p/120174-debugger/rt-120174';

sub capture_return_value {
    my $rv = shift;
    $rv = 0 unless $rv;
    $passcount += $rv;
}

I used subroutine capture_return_value() to capture the return value of a given call to contents_like() -- 1 for PASS and "" for FAIL and add that to a tally of passing tests.

{
    # perl 5 RT #120174 - 'p' command
    my $wrapper = DebugWrap->new(
        {
            cmds =>
            [
                'b 2',
                'c',
                'p@abc',
                'q',
            ],
            prog => $myprog,
        }
    );

    capture_return_value($wrapper->contents_like(
        qr/1234/,
        q/RT 120174: p command can be invoked without space after 'p'/,
    ));
}

If at the end of the test file $passcount is equal to $plancount, then the file as a whole has passed and ought to exit with Unix-success, that is, with a 0 exit value.

If, however, some of the unit tests have not succeeded, then $passcount will be less than $plancount. The file as a whole fails and ought to exit with Unix-failure, that is, a non-zero exit value. In adapted-perl5db.t we have an END block which does the calculation for us.

END {
    my $netcount = $plancount - $passcount;
    1 while unlink ($rc_filename, $out_fn);
    exit $netcount;
}

So we now have a Perl program which can simulate individual Perl debugger commands and test assertions as to the output of those commands. We can build perl at any given commit, run the test program at that commit and determine whether the test program is "working" or not.

Stage 5: At Long Last, Bisection

I generally perform bisection of the Perl 5 core distribution in a git checkout different from my everyday checkout of blead. Hence, this sequence of commands:

$> export PERL_WORKDIR="$HOMEDIR/gitwork/perl2"
$> cd $PERL_WORKDIR
$> git clean -dfx
$> git checkout blead

Recall from early in this posting that I had determined that the behavior of the Perl 5 debugger had changed somewhere between the 5.16.3 and 5.18.4 production releases. Roughly speaking that change took place between commits 68cd360812 and 1799399c45. Porting/bisect.pl can be called with --start and --end command-line switches so as to shorten the time bisection takes. That enabled me to compose and invoke this bisection command:

perl ./Porting/bisect.pl \
    --start 68cd360812 \
    --end   1799399c45 \
    -- ./perl -Ilib ~/learn/perl/p5p/120174-debugger/adapted-perl5db.t

This instance of Porting/bisect.pl took 21 minutes on my laptop. Upon conclusion it displayed:

    HEAD is now at d478d7a Add more commands to the lookup table.
bad - non-zero exit from ./perl -Ilib /home/jkeenan/learn/perl/p5p/120174-debugger/adapted-perl5db.t
d478d7a00517925a72f7077951fbed1283fb6a07 is the first bad commit
commit d478d7a00517925a72f7077951fbed1283fb6a07
Author: Shlomi Fish <[email protected]>
Date:   Sun Oct 14 12:56:53 2012 +0200

Add more commands to the lookup table.

:040000 040000 9182f304ff915f7da07be7ffdd7a0006aa128656 f0d83b81b7e7c82259a8cc87ab30e4e79af80a51 M  lib
bisect run success
That took 1267 seconds.

Verification of Bisection

Often times I'm not entirely sure of the results of running Porting/bisect.pl. So to confirm the results I checked out the commit identified by the bisection, built perl at that commit, then ran the test program at that commit.

$> git clean -dfx
$> git checkout d478d7a
$> sh ./Configure -des -Dusedevel
$> make test_prep
$> ./perl -Ilib /home/jkeenan/learn/perl/p5p/120174-debugger/adapted-perl5db.t

As I expected, the test program threw exceptions and FAILed at that commit.

I then checked out the immediately preceding commit, built perl at that commit, then ran the test program again.

$> git clean -dfx
$> git checkout d478d7a^                # <-- Note the ^
$> sh ./Configure -des -Dusedevel
$> make test_prep
$> ./perl -Ilib /home/jkeenan/learn/perl/p5p/120174-debugger/adapted-perl5db.t

As I expected, the test program threw no exceptions and PASSed at this commit.

Stage 6: A Closer Look Means We Need More Bisection

Recall that in RT #120174 Smylers reported that in a newer perl he got exceptions when using either the p or x debugger commands without a subsequent wordspace. It turns out that the p and x commands "broke" at different commits. At commit d478d7a, the x command "broke" but the p command still "worked". I still had the task of identifying the commit at which the p command first "broke".

To learn that, I had to set up a second round of bisection.

$> export PERL_WORKDIR="$HOMEDIR/gitwork/perl2"
$> cd $PERL_WORKDIR
$> git clean -dfx
$> git checkout blead

perl ./Porting/bisect.pl \
    --start d478d7a005 \
    --end   1799399c45 \
    -- ./perl -Ilib ~/learn/perl/p5p/120174-debugger/p-only-adapted-perl5db.t

The end parameter is the same as in the previous bisection. But now for the start parameter I am providing the SHA which was the "first bad commit" from the earlier bisection. Upon concluding some 18 minutes later, Porting/bisect.pl displayed this:

HEAD is now at c9a9a6c Move more commands to the dispatch.
good - zero exit from ./perl -Ilib
/home/jkeenan/learn/perl/p5p/120174-debugger/p-only-adapted-perl5db.t
8f144dfcb6b7d7a28faa9e82a687338b141c272f is the first bad commit
commit 8f144dfcb6b7d7a28faa9e82a687338b141c272f
Author: Shlomi Fish <[email protected]>
Date:   Sun Oct 14 13:22:45 2012 +0200

    Low hanging fruit is now in %cmd_lookup.

:040000 040000 111452b264cefed7ee7ae68f4a222fba1ae0de02
ceae6a53cd1591c6bd4d36f46919377485f29709 M  lib
bisect run success
That took 1099 seconds.

This illustrates a point which we have to bear in mind whenever we are using bisection as a diagnostic tool. git bisect and, by extension, Porting/bisect.pl will natively only tell us the first commit at which certain source code has broken. They cannot say anything as to whether or not the code broke at other commits after the point of first breakage. The possibility of multiple points of breakage increases with the number of commits in question and the extent to which the test suite fails to exercise (or "cover") the source code in question. (See my CPAN module Devel::Git::MultiBisect and my December 2016 talk at Philadelphia.pm for another case where we had multiple points of breakage in the Perl 5 core distribution.)

Where Do We Stand Now?

In this particular case of the Perl 5 debugger, the use of the p and x commands without a subsequent wordspace has never been tested and never been documented. The fact that they worked at all constitutes an accidental feature. From 2012 to 2013 Israeli Perl hacker Shlomi Fish had a Perl foundation grant to clean up the debugger code. At the conclusion of his work it was merged into Perl 5 blead by then-pumpking Rik Signes. But since the "feature" had never been documented, neither Shlomi nor Rik nor any of the Perl 5 Porters with commit bits -- including myself -- ever caught this change in the debugger's behavior. Smylers didn't report the bug until five months after the release of perl-5.18, and no one else paid it any mind for nearly four more years.

As I write this in early September 2017, RT 120174 has not been resolved. If the wordspace-less syntax had been documented, then we would be obliged to support it by writing patches to revert certain parts of Shlomi's 2012-13 work and -- just as important -- to include tests of the syntax in lib/perl5db.t.

Given that the wordspace-less syntax was never documented, however, the case for making it work once again is exceedingly weak. The point of Shlomi's 2012-13 work was to clean up the debugger's code in support of its documented functionality. Restoring previously undocumented functionality -- functionality which only one person on the planet has ever testified to using -- will necessarily make the debugger's code more complex and difficult to maintain going forward. I do not believe that it is the job of the Perl 5 Porters or the Perl community more generally to support indefinitely any accidental features that any Perl programmers have ever stumbled upon. Hence, I am hoping that the pumpking will rule in favor of not restoring accidental features to the Perl 5 debugger.

jkeenan/how-to-bisect-a-perl-debugger-problem.mkdn