Skip to content

Instantly share code, notes, and snippets.

@skids
Created February 24, 2018 02:22
Show Gist options
  • Save skids/0745bfc48eac0c0f8ddf99fd81bf50ed to your computer and use it in GitHub Desktop.
Save skids/0745bfc48eac0c0f8ddf99fd81bf50ed to your computer and use it in GitHub Desktop.
Playing with rakudo optimization
So there I was asking myself why we have so much nqp code cropping up in
the rakudo core... what prevents rakudo from producing as efficient opcodes
from Perl6 source for even basic things like ifs and whiles? Just how
much performance are we gaining, anyway? Let's take the case of while
versus nqp::while:. Typical results for two comparable busy loops on
my machine are:
```
$ time perl6 -e 'use nqp; my $i = 1; nqp::while(nqp::islt_i($i,100000000), $i := nqp::add_i($i,1)); exit(0);'
real 0m5.729s
user 0m5.748s
sys 0m0.024s
```
```
$ time perl6 -e 'use nqp; my $i = 1; while nqp::islt_i($i,100000000) { $i := nqp::add_i($i,1) }; exit(0)'
real 0m5.951s
user 0m5.972s
sys 0m0.024s
```
...somehow the Perl6 code picks up 2% to 4% overhead costs.
I decided to take a look at the AST. You do this by running rakudo with the --target flag.
In this case, I want to see the AST after the Perl6 high level optimizer
has run, so the flag is --target=optimize
Removing the parts that are identical the Perl6 while loop body has
this extra stuff around incrementing $i.
```
- QAST::Stmts(:resultchild(1)))
- QAST::Op(bind)
- QAST::Var(local pres_topic__1)
- QAST::Var(lexical $_)
...
- QAST::WVal(Bool)
- QAST::Op(bind)
- QAST::Var(lexical $_)
- QAST::Var(local pres_topic__1)
```
Reading the code in src/Perl6/Optimizer.nqp it is pretty easy to figure out
that the bind operations are the result of inlining the body of the while loop,
but since the optimizer cannot prove that the code in the loop body will not
alter $_, it saves and restores $_ before and after the inline.
On every loop iteration.
So just to see if removing those bind calls sped up the code, I added a little
code in Perl6::Optimizer.visit_op, which will be called on every QAST::Op node
and so, on while loops.
```
--- a/src/Perl6/Optimizer.nqp
+++ b/src/Perl6/Optimizer.nqp
@@ -1253,6 +1253,16 @@ class Perl6::Optimizer {
# Visit the children.
self.visit_op_children($op);
+ if $!level >= 90 {
+ if $optype eq 'while' {
+ my $qast := QAST::Stmts.new: $op[1][0], $op, $op[1][2];
+ $op[1].shift;
+ $op[1].pop;
+ $op[1].resultchild(0);
+ return $qast;
+ }
+ }
+
```
...Let's look at that code for a bit. First, you would not want to necessarily
do this optimization to every while loop that got inlined, because some loop
bodies or conditionals or phasers or whatnot may actually alter or use $_ in some
tricky way. So we prevent this from happening when we build rakudo (and during
normal usage) by only performing the optimization when the --optimize flag has
been given a silly value.
This trick makes it realy easy to test out optimizer changes.
Since this is just a test, and we already know what our AST is going to look like,
we just shuffle things around to move the bind operations outside the while loop,
so they only run once. We leave the QAST::Stmts node there even though we could
replace it with a QAST::Stmt. Since we have moved the loop body from position
1 to position 0 inside the QAST::Stmts, we have to adjust the resultschild attribute
to tell the compiler to use the 0th element of the QAST::Stmts as the result
which that QAST node is producing.
Now let's see if that sped things up. Typical results were:
```
$ time perl6 --optimize=99 -e 'use nqp; my $i = 1; while nqp::islt_i($i,100000000) { $i := nqp::add_i($i,1) }; exit(0)'
real 0m5.732s
user 0m5.752s
sys 0m0.028s
```
...there is still a tiny hair of a performance loss, but most of it is gone. So, if an
optimization can be made to figure out when it is safe to make this transformation,
we could speed up tight loops in general for a rakudo performance improvement when
running Perl6 code, and probably also stop using nqp::while in several places and enjoy
prettier syntax in the core code.
If those bind operations even really belong there in the first place...
I'm a bit behind the curve when it comes to fully unerstanding Perl6 scoping, or the
finer points of scoping in general for that matter, so I'm a ways off from fully
understanding all the things that could make such optimizations unsafe... it's something
I'm going to have to poke around to try to figure out.
During this endeavor I had at one point put some test nqp::says in delete_unused_magicals
in the Optimizer. During the rakudo build, I don't think I saw even one instance
of this block actually deleting anything (things do scroll go by pretty fast though)
This is probably because that code will not remove unused $_ from a block if it
contains any calls at all... and almost any Perl6 operator involves a call (at least,
at this stage of optimization it does.) The only way to fix that would be if the called
function/method (or list of possible methods) could be determined and if they bore an
annotation that they were not going to do something like CALLERS::<$_> = 42, when that
could be determined. Such an annotation would have to be embedded in the bytecode of
the module from which the function is being included...
However, determining whether a loop body needs to ensconse $_ on each iteration, rather
than around the loop as a whole might be easier, given that conditionals are often
pretty simple code fragments.
Anyway, what I'd like people to take from this is that, while optimizing the core
by using inline nqp has managed to gain rakudo an amazing amount of performance over
the last couple years, there's another way to optimize... the Optimizer... and it
turns out it is pretty easy to tinker with.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment