Created
February 24, 2018 02:22
-
-
Save skids/0745bfc48eac0c0f8ddf99fd81bf50ed to your computer and use it in GitHub Desktop.
Playing with rakudo optimization
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
So there I was asking myself why we have so much nqp code cropping up in | |
the rakudo core... what prevents rakudo from producing as efficient opcodes | |
from Perl6 source for even basic things like ifs and whiles? Just how | |
much performance are we gaining, anyway? Let's take the case of while | |
versus nqp::while:. Typical results for two comparable busy loops on | |
my machine are: | |
``` | |
$ time perl6 -e 'use nqp; my $i = 1; nqp::while(nqp::islt_i($i,100000000), $i := nqp::add_i($i,1)); exit(0);' | |
real 0m5.729s | |
user 0m5.748s | |
sys 0m0.024s | |
``` | |
``` | |
$ time perl6 -e 'use nqp; my $i = 1; while nqp::islt_i($i,100000000) { $i := nqp::add_i($i,1) }; exit(0)' | |
real 0m5.951s | |
user 0m5.972s | |
sys 0m0.024s | |
``` | |
...somehow the Perl6 code picks up 2% to 4% overhead costs. | |
I decided to take a look at the AST. You do this by running rakudo with the --target flag. | |
In this case, I want to see the AST after the Perl6 high level optimizer | |
has run, so the flag is --target=optimize | |
Removing the parts that are identical the Perl6 while loop body has | |
this extra stuff around incrementing $i. | |
``` | |
- QAST::Stmts(:resultchild(1))) | |
- QAST::Op(bind) | |
- QAST::Var(local pres_topic__1) | |
- QAST::Var(lexical $_) | |
... | |
- QAST::WVal(Bool) | |
- QAST::Op(bind) | |
- QAST::Var(lexical $_) | |
- QAST::Var(local pres_topic__1) | |
``` | |
Reading the code in src/Perl6/Optimizer.nqp it is pretty easy to figure out | |
that the bind operations are the result of inlining the body of the while loop, | |
but since the optimizer cannot prove that the code in the loop body will not | |
alter $_, it saves and restores $_ before and after the inline. | |
On every loop iteration. | |
So just to see if removing those bind calls sped up the code, I added a little | |
code in Perl6::Optimizer.visit_op, which will be called on every QAST::Op node | |
and so, on while loops. | |
``` | |
--- a/src/Perl6/Optimizer.nqp | |
+++ b/src/Perl6/Optimizer.nqp | |
@@ -1253,6 +1253,16 @@ class Perl6::Optimizer { | |
# Visit the children. | |
self.visit_op_children($op); | |
+ if $!level >= 90 { | |
+ if $optype eq 'while' { | |
+ my $qast := QAST::Stmts.new: $op[1][0], $op, $op[1][2]; | |
+ $op[1].shift; | |
+ $op[1].pop; | |
+ $op[1].resultchild(0); | |
+ return $qast; | |
+ } | |
+ } | |
+ | |
``` | |
...Let's look at that code for a bit. First, you would not want to necessarily | |
do this optimization to every while loop that got inlined, because some loop | |
bodies or conditionals or phasers or whatnot may actually alter or use $_ in some | |
tricky way. So we prevent this from happening when we build rakudo (and during | |
normal usage) by only performing the optimization when the --optimize flag has | |
been given a silly value. | |
This trick makes it realy easy to test out optimizer changes. | |
Since this is just a test, and we already know what our AST is going to look like, | |
we just shuffle things around to move the bind operations outside the while loop, | |
so they only run once. We leave the QAST::Stmts node there even though we could | |
replace it with a QAST::Stmt. Since we have moved the loop body from position | |
1 to position 0 inside the QAST::Stmts, we have to adjust the resultschild attribute | |
to tell the compiler to use the 0th element of the QAST::Stmts as the result | |
which that QAST node is producing. | |
Now let's see if that sped things up. Typical results were: | |
``` | |
$ time perl6 --optimize=99 -e 'use nqp; my $i = 1; while nqp::islt_i($i,100000000) { $i := nqp::add_i($i,1) }; exit(0)' | |
real 0m5.732s | |
user 0m5.752s | |
sys 0m0.028s | |
``` | |
...there is still a tiny hair of a performance loss, but most of it is gone. So, if an | |
optimization can be made to figure out when it is safe to make this transformation, | |
we could speed up tight loops in general for a rakudo performance improvement when | |
running Perl6 code, and probably also stop using nqp::while in several places and enjoy | |
prettier syntax in the core code. | |
If those bind operations even really belong there in the first place... | |
I'm a bit behind the curve when it comes to fully unerstanding Perl6 scoping, or the | |
finer points of scoping in general for that matter, so I'm a ways off from fully | |
understanding all the things that could make such optimizations unsafe... it's something | |
I'm going to have to poke around to try to figure out. | |
During this endeavor I had at one point put some test nqp::says in delete_unused_magicals | |
in the Optimizer. During the rakudo build, I don't think I saw even one instance | |
of this block actually deleting anything (things do scroll go by pretty fast though) | |
This is probably because that code will not remove unused $_ from a block if it | |
contains any calls at all... and almost any Perl6 operator involves a call (at least, | |
at this stage of optimization it does.) The only way to fix that would be if the called | |
function/method (or list of possible methods) could be determined and if they bore an | |
annotation that they were not going to do something like CALLERS::<$_> = 42, when that | |
could be determined. Such an annotation would have to be embedded in the bytecode of | |
the module from which the function is being included... | |
However, determining whether a loop body needs to ensconse $_ on each iteration, rather | |
than around the loop as a whole might be easier, given that conditionals are often | |
pretty simple code fragments. | |
Anyway, what I'd like people to take from this is that, while optimizing the core | |
by using inline nqp has managed to gain rakudo an amazing amount of performance over | |
the last couple years, there's another way to optimize... the Optimizer... and it | |
turns out it is pretty easy to tinker with. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment