Bisection is a developer's technique used to locate the point in a series of commits to a repository where the behavior of an application changed. While this change could be a change for the better -- "When did this feature start accidentally working?" -- the change being sought is usually a change for the worse: "When did this feature stop working correctly?".
Source code stored in a git
repository can generally be bisected with some
variant of the git bisect
command. In the Perl 5 repository we have a
program, Porting/bisect.pl
, which is essentially a giant wrapper around
git bisect
intended to handle bisection situations specific to that
codebase.
For reasons I do not understand, the documentation for
Porting/bisect.pl
is actually found in a different file:
Porting/bisect-runner.pl
. The synopsis for that documentation (as of
perl-5.27.3) contains a number of typical use cases:
$> perldoc Porting/bisect-runner.pl
...
SYNOPSIS
# When did this become an error?
.../Porting/bisect.pl -e 'my $a := 2;'
# When did this stop being an error?
.../Porting/bisect.pl --expect-fail -e '1 // 2'
# When did this stop matching?
.../Porting/bisect.pl --match '\b(?:PL_)hash_seed_set\b'
# When did this start matching?
.../Porting/bisect.pl --expect-fail --match '\buseithreads\b'
# When did this test program stop working?
.../Porting/bisect.pl -- ./perl -Ilib ../test_prog.pl
# When did this first become valid syntax?
.../Porting/bisect.pl --target=miniperl --end=v5.10.0 \
--expect-fail -e 'my $a := 2;'
# What was the last revision to build with these options?
.../Porting/bisect.pl --test-build -Dd_dosuid
Bisection is not something anyone does on an every day basis. Consequently, every time I have to bisect some code I have to stop to think, "Which of these cases covers my current condition?" I eventually figured out that it would be good to write down the answers to that question! So what follows is one instance of problem solving via bisection in the Perl 5 core distribution. YMMV!
Is there anyone who loves the Perl 5 debugger? Let's have a show of hands.
Ah, I didn't think so. It looks ugly on the screen. It's less user-friendly as a
learning tool than REPLs in, say, Python or Scala. It's something which its
own documentation warns you not to use unless you have already run your code
with warnings enabled. Its learning curve is much more difficult than simply
scattering print
statements through your code. And since you only use when
you've got a sticky problem, you don't use it often enough to master all its
functionality.
As a consequence, the debugger's own code doesn't get much love, either. The
debugger is nothing but a Perl program found within the Perl 5 core
distribution -- lib/perl5db.pl
, to be precise. So bugs in the debugger are
reported to the main Perl 5 bug tracker, RT, where they
tend to linger un-worked-on for years.
Such was the case with RT #120174,
entitled "Debugger
command regression: Now requires more spaces". This ticket was created by a
long-time British (I think) Perl hacker named Smylers on the anniversary of
Thelonious Monk's birth in October 2013. Smyler reported that for a long time
he had been accustomed to using the p
and x
debugger commands -- used to
print and execute statements, respectively -- without a subsequent
wordspace, like this:
p@ARGV
x@ARGV
x\@ARGV
x\%INC
This syntax is one character shorter than the documented syntax for these commands:
p @ARGV
x @ARGV
x \@ARGV
x \%INC
Smylers noted that the wordspace-less syntax worked in perl-5.14.3 but was clearly not working in perl-5.19.5. By "not working", Smyler meant that when he tried the wordspace-less syntax, he got exceptions like this:
DB<72> x\@ARGV
Backslash found where operator expected at (eval 8)[lib/perl5db.pl:732] line 2, near "x\"
at (eval 8)[lib/perl5db.pl:732] line 2.
eval 'no strict; ($@, $!, $^E, $,, $/, $\\, $^W) = @DB::saved;package main; $^D = $^D | $DB::db_stop;
x\\@ARGV;
' called at lib/perl5db.pl line 732
DB::eval called at lib/perl5db.pl line 3091
DB::DB called at -e line 1
syntax error at (eval 8)[lib/perl5db.pl:732] line 2, near "x\"
Smylers then added, "I haven't tried to narrow it down any further, mainly cos I don't know how to come up with a non-interactive way of demonstrating the change."
Neither, apparently, did anyone else. Nor was anyone else moved to reply to this bug report for nearly four years. However, having too much time on my hands this summer, I began to once again work my way through the RT "New" queue, that is, RT tickets which had not received a single reply. On September 3, 2017, I confirmed Smyler's original report. It then took me several stages to, in Smyler's words, "... come up with a non-interactive way of demonstrating the change."
In hacking on the Perl 5 core distribution the command-line tool perlbrew
is your friend. You can get it from CPAN at
App-perlbrew. It was
created by a Perl hacker named Kang-min Liu, whose nationality or terrestrial
location, to be honest, I do not know. With perlbrew
you can have multiple
versions of perl
installed beneath your home directory. So if you want to
find out how something worked in, say, perl-5.14 back in 2011-12, you simply
say:
perlbrew use perl-5.14.4
And there it is!
$ perl -v | head -2 | tail -1
This is perl 5, version 14, subversion 4 (v5.14.4) built for x86_64-linux
So the first thing I did in studying Smyler's problem was to use perlbrew
to quickly move through Perl versions, firing up the debugger in each, until I
found:
- The most recent production version in which the wordspace-less syntax worked; and
- The oldest production version in which the wordspace-less syntax no longer worked.
I was quickly able to determine that the wordspace-less syntax was working in perl-5.16.3 but was not working in perl-5.18.4. This meant that the change of behavior probably took place between 2012 and 2013, somewhere between this commit:
commit 68cd360812f9eaa2d34c45c501e2fef87c44ccde
CommitDate: Tue May 1 19:08:57 2012 -0400
... and this commit:
commit 1799399c45baa7e5af45db33567d2a2de629d5e0
CommitDate: Fri Jun 7 11:51:45 2013 -0400
In this period of slightly over a year, there were more than 200 commits to
lib/perl5db.pl
-- but thousands of commits to the entire core
distribution. This was clearly going to need bisection.
But which of the various invocations for Porting/bisect.pl
shown above was
I to use? Well, if any other Perl program terminated with syntax errors like
Backslash found where operator expected
, I would know that the program, in
Unix terms, exited with a non-zero error number. Any program that ended with
EXIT_FAILURE
would be suitable for this invocation:
.../Porting/bisect.pl -- ./perl -Ilib ../test_prog.pl
... as this version is designed to report the first commit at which
../test_prog.pl
failed, i.e., exited with a non-zero error number.
The Perl 5 debugger, however, is not just any other Perl program. When you
are debugging code and encounter an exception, the debugger traps that
exception but it itself does not die
. It waits for you to tell it what to
do next. If, for example, you tell the debugger to quit (q
), the debugger
program will exit successfully, i.e., it will exit 0
.
Hence, we won't go anywhere with an invocation in which the debugger is itself the final argument:
.../Porting/bisect.pl -- ./perl -Ilib lib/perl5db.pl # <-- WON'T WORK
So what we have to do is to write a program which:
- simulates the action of the debugger's
p
andx
commands; - throws exceptions when the simulated
p
andx
commands are invoked without a subsequent wordspace; and - exits with a non-zero error number that can be detected by
Porting/bisect.pl
and cue the termination of the bisection process.
What we do a program that exercises parts of an application's functionality?
We call it a test.
So, to exercise the debugger's p
or x
commands we can seek guidance by
looking at the way the Perl 5 debugger is itself tested within the core
distribution. The debugger is tested via lib/perl5db.t
. To get a basic
understanding of how that program works, let's suppose that we have a little,
4-line Perl program whose code we want to step through with the debugger:
$> cat rt-120174
@abc = (1..4);
print "hello world\n";
%xyz = ( 'alpha' => 'beta', 'gamma' => 'delta' );
print "goodbye world\n";
We place this program in lib/perl5db/t
. We then add a block of code to
lib/perl5db.t
which sucks in this 4-line program and iterates through an
array of debugger commands until we get success or failure.
{
# perl 5 RT #120174 - 'p' command
my $wrapper = DebugWrap->new(
{
cmds =>
[
'b 2',
'c',
'p@abc',
'q',
],
prog => '../lib/perl5db/t/rt-120174',
}
);
...
}
lib/perl5db.t
contains a DebugWrap
whose constructor takes a
hash reference with cmds
and prog
elements. The cmds
element holds a
list of debugger commands. The prog
element points to the chunk of code
we're going to step through. In the example above, we:
-
Set a breakpoint for line 2
-
Instruct the program to continue until the breakpoint is reached
-
At the breakpoint print out
@abc
Based on the 4-line program -- and assuming the wordspace-less syntax is working -- we expect that output to be:
1234
That is, the same as the output of this command in a regular Perl program:
print "@abc\n";
-
Quit the debugger
The DebugWrap
class provides a contents_like()
method which asks if the
output of a debugger command matches a pattern specified as the method's first
argument.
{
...
$wrapper->contents_like(
qr/1234/,
q/RT 120174: p command can be invoked without space after 'p'/,
);
}
If you were to create branches from an "old" version of Perl and from Perl 5 blead, respectively, then add the tests described above to each branch, you would see that:
- In the "old" version
p@abc
works as Smylers was accustomed to having it work, that is, it would print1234
and thecontents_like
unit test wouldPASS
-- permittinglib/perl5db.t
andmake test
to pass as well. - In the blead-based version
p@abc
would throw exceptions and thecontents_like
unit test wouldFAIL
-- in turn causinglib/perl5db.t
andmake test
to fail.
If the unit test block and sample code above had been added to the Perl 5 core distribution ten years ago, then we would have known when the behavior changed. But it wasn't, which means we have to hunt for the breaking commit. We have to place our testing code and our 4-line sample code somewhere outside the core distribution on our filesystem.
That's conceptually simple, but difficult in practice. Why? Because
lib/perl5db.t
require
s code which is found only within the core
distribution, specifically, t/test.pl
.
BEGIN {
chdir 't' if -d 't';
@INC = '../lib';
require './test.pl';
}
The DebugWrap
package uses subroutines imported from t/test.pl
. We have
to jump through some hoops to be able to use the DebugWrap
package to work
properly. I had to write a program which I called adapted-perl5db.t
which
I placed in a scratch directory. This program starts by chdir
-ing to my
git-checkout directory, and then require
s t/test.pl
from there.
BEGIN {
chdir $ENV{PERL_WORKDIR};
print STDERR "Working from: $ENV{PERL_WORKDIR}\n";
chdir 't' if -d 't';
@INC = '../lib';
require './test.pl';
}
adapted-perl5db.t
then copies much of the code from lib/perl5db.t
but
throws out all the unit test blocks except the ones we want for testing our
current bug.
my $plancount = 4;
plan($plancount);
In package main
within adapted-perl5db.t
, I define the
following variables and subroutines:
my $passcount = 0;
my $myprog = '/home/jkeenan/learn/perl/p5p/120174-debugger/rt-120174';
sub capture_return_value {
my $rv = shift;
$rv = 0 unless $rv;
$passcount += $rv;
}
I used subroutine capture_return_value()
to capture the return value of a
given call to contents_like()
-- 1
for PASS
and ""
for FAIL
and
add that to a tally of passing tests.
{
# perl 5 RT #120174 - 'p' command
my $wrapper = DebugWrap->new(
{
cmds =>
[
'b 2',
'c',
'p@abc',
'q',
],
prog => $myprog,
}
);
capture_return_value($wrapper->contents_like(
qr/1234/,
q/RT 120174: p command can be invoked without space after 'p'/,
));
}
If at the end of the test file $passcount
is equal to $plancount
, then
the file as a whole has passed and ought to exit with Unix-success, that is,
with a 0
exit value.
If, however, some of the unit tests have not succeeded, then $passcount
will be less than $plancount
. The file as a whole fails and ought to exit
with Unix-failure, that is, a non-zero exit value. In adapted-perl5db.t
we
have an END
block which does the calculation for us.
END {
my $netcount = $plancount - $passcount;
1 while unlink ($rc_filename, $out_fn);
exit $netcount;
}
So we now have a Perl program which can simulate individual Perl debugger
commands and test assertions as to the output of those commands. We can
build perl
at any given commit, run the test program at that commit and
determine whether the test program is "working" or not.
I generally perform bisection of the Perl 5 core distribution in a git checkout different from my everyday checkout of blead. Hence, this sequence of commands:
$> export PERL_WORKDIR="$HOMEDIR/gitwork/perl2"
$> cd $PERL_WORKDIR
$> git clean -dfx
$> git checkout blead
Recall from early in this posting that I had determined that the behavior of
the Perl 5 debugger had changed somewhere between the 5.16.3 and 5.18.4
production releases. Roughly speaking that change took place between commits
68cd360812 and 1799399c45. Porting/bisect.pl
can be called with --start
and --end
command-line switches so as to shorten the time bisection takes.
That enabled me to compose and invoke this bisection command:
perl ./Porting/bisect.pl \
--start 68cd360812 \
--end 1799399c45 \
-- ./perl -Ilib ~/learn/perl/p5p/120174-debugger/adapted-perl5db.t
This instance of Porting/bisect.pl
took 21 minutes on my laptop. Upon conclusion it displayed:
HEAD is now at d478d7a Add more commands to the lookup table.
bad - non-zero exit from ./perl -Ilib /home/jkeenan/learn/perl/p5p/120174-debugger/adapted-perl5db.t
d478d7a00517925a72f7077951fbed1283fb6a07 is the first bad commit
commit d478d7a00517925a72f7077951fbed1283fb6a07
Author: Shlomi Fish <[email protected]>
Date: Sun Oct 14 12:56:53 2012 +0200
Add more commands to the lookup table.
:040000 040000 9182f304ff915f7da07be7ffdd7a0006aa128656 f0d83b81b7e7c82259a8cc87ab30e4e79af80a51 M lib
bisect run success
That took 1267 seconds.
Often times I'm not entirely sure of the results of running
Porting/bisect.pl
. So to confirm the results I checked out the commit
identified by the bisection, built perl
at that commit, then ran the test
program at that commit.
$> git clean -dfx
$> git checkout d478d7a
$> sh ./Configure -des -Dusedevel
$> make test_prep
$> ./perl -Ilib /home/jkeenan/learn/perl/p5p/120174-debugger/adapted-perl5db.t
As I expected, the test program threw exceptions and FAIL
ed at that commit.
I then checked out the immediately preceding commit, built perl
at
that commit, then ran the test program again.
$> git clean -dfx
$> git checkout d478d7a^ # <-- Note the ^
$> sh ./Configure -des -Dusedevel
$> make test_prep
$> ./perl -Ilib /home/jkeenan/learn/perl/p5p/120174-debugger/adapted-perl5db.t
As I expected, the test program threw no exceptions and PASS
ed at this
commit.
Recall that in RT #120174 Smylers reported that in a newer perl
he got
exceptions when using either the p
or x
debugger commands without a
subsequent wordspace. It turns out that the p
and x
commands "broke" at
different commits. At commit d478d7a, the x
command "broke" but the p
command still "worked". I still had the task of identifying the commit at
which the p
command first "broke".
To learn that, I had to set up a second round of bisection.
$> export PERL_WORKDIR="$HOMEDIR/gitwork/perl2"
$> cd $PERL_WORKDIR
$> git clean -dfx
$> git checkout blead
perl ./Porting/bisect.pl \
--start d478d7a005 \
--end 1799399c45 \
-- ./perl -Ilib ~/learn/perl/p5p/120174-debugger/p-only-adapted-perl5db.t
The end
parameter is the same as in the previous bisection. But now for
the start
parameter I am providing the SHA which was the "first bad commit"
from the earlier bisection. Upon concluding some 18 minutes later,
Porting/bisect.pl
displayed this:
HEAD is now at c9a9a6c Move more commands to the dispatch.
good - zero exit from ./perl -Ilib
/home/jkeenan/learn/perl/p5p/120174-debugger/p-only-adapted-perl5db.t
8f144dfcb6b7d7a28faa9e82a687338b141c272f is the first bad commit
commit 8f144dfcb6b7d7a28faa9e82a687338b141c272f
Author: Shlomi Fish <[email protected]>
Date: Sun Oct 14 13:22:45 2012 +0200
Low hanging fruit is now in %cmd_lookup.
:040000 040000 111452b264cefed7ee7ae68f4a222fba1ae0de02
ceae6a53cd1591c6bd4d36f46919377485f29709 M lib
bisect run success
That took 1099 seconds.
This illustrates a point which we have to bear in mind whenever we are using
bisection as a diagnostic tool. git bisect
and, by extension,
Porting/bisect.pl
will natively only tell us the first commit at which
certain source code has broken. They cannot say anything as to whether or not
the code broke at other commits after the point of first breakage. The
possibility of multiple points of breakage increases with the number of
commits in question and the extent to which the test suite fails to exercise
(or "cover") the source code in question. (See my
CPAN module Devel::Git::MultiBisect
and my December 2016 talk at Philadelphia.pm for
another case where we had multiple points of breakage in the Perl 5 core
distribution.)
In this particular case of the Perl 5 debugger, the use of the p
and x
commands without a subsequent wordspace has never been tested and never
been documented. The fact that they worked at all constitutes an
accidental feature. From 2012 to 2013 Israeli Perl hacker Shlomi Fish had
a Perl foundation grant to clean up the debugger code. At the conclusion of
his work it was merged into Perl 5 blead by then-pumpking Rik Signes. But
since the "feature" had never been documented, neither Shlomi nor Rik nor any
of the Perl 5 Porters with commit bits -- including myself -- ever caught this
change in the debugger's behavior. Smylers didn't report the bug until five
months after the release of perl-5.18, and no one else paid it any mind for
nearly four more years.
As I write this in early September 2017, RT 120174 has not been resolved. If
the wordspace-less syntax had been documented, then we would be obliged to
support it by writing patches to revert certain parts of Shlomi's 2012-13 work
and -- just as important -- to include tests of the syntax in
lib/perl5db.t
.
Given that the wordspace-less syntax was never documented, however, the case for making it work once again is exceedingly weak. The point of Shlomi's 2012-13 work was to clean up the debugger's code in support of its documented functionality. Restoring previously undocumented functionality -- functionality which only one person on the planet has ever testified to using -- will necessarily make the debugger's code more complex and difficult to maintain going forward. I do not believe that it is the job of the Perl 5 Porters or the Perl community more generally to support indefinitely any accidental features that any Perl programmers have ever stumbled upon. Hence, I am hoping that the pumpking will rule in favor of not restoring accidental features to the Perl 5 debugger.