Below is a formal list of possible optimizations, which most would agree on. We had these discussion in 2001 with damian were perl6 and perl5i took off. I'd like to work on these for perl5 core and need decisions. Most p5p hackers seem to be informed about the general possibilities and directions, but not all. We'd need this to improve general perl5 performance, and also help static compilation.[1]
We had this before, so I'd like to keep it formal. So each proposal gets a perl6-like name, and replies should change the subject to that name. I choose PDD for "Perl Design Draft".
Beforehand: "compiler" means op.c not B::C. compile-time and run-time should be obvious.
The CONST op currently is a SVOP, holding a global gvsv. A CONST op might hold lexicals also, a PADOP type. The more constants the compiler knows at compile-time the better it can optimize. The following datatypes need to be represented as const:
-
PADSV (lexicals and esp. function arguments)
-
"PDD02 final classes - const @ISA"
-
"PDD03 immutable classes - const %class::"
Esp. readonly function arguments need to be parsed into lexical consts, but "my const $i" or "my $i:ro" also. I have no opinion on "my $i is ro", but it would be the best choice. See "PDD05 Function and method signatures"
SVt_READONLY already is good enough to hold this information in the data. But the compiler does not want to optimize on datatypes, the information needs to represented as OP. Just for the special cases @ISA and stashes it is not needed.
So either add a mixin svop+padop type for CONST decriminated by OPpCONST_PAD 1, add a CONST flag to PADSV,
or add a new CONSTPAD op, replacing PADSV/const which needs to be added into all current CONST checks in the compiler.
Pro: Easier and faster for the compiler.
Contra: The logic for the new OP type which is a union of SVOP and PADOP needs to be added for all accessors. B and its libraries, but also XS walkers.
Pro: Does not break libraries
Contra: CONST checks need to check PADSV's also.
Pro: Does not break libraries
Contra: CONST checks need to check CONSTPAD also.
Personally I lean against CONSTPAD.
The following variants are being considered: lexicals and globals:
my const $i; my const ($i, $j) = (0, 1); (as const keyword upfront)
my $i :ro;
my $i is ro;
See "PDD05 Function and method signatures"
sub call (const $i) {}
sub call ($i:ro) {}
sub call ($i is ro) {}
See "PDD02 final classes - const @ISA"
const our @ISA = ('MyBase');
our @ISA :ro = ('MyBase');
our @ISA is ro = ('MyBase');
class MyClass is final {
our @ISA = ('MyBase');
}
class MyClass (extends => ('MyBase'), is_final => 1) {}
See "PDD03 immutable classes - const %class::"
const package MyClass { } and const package MyClass;
const %MyClass::;
class MyClass is immutable {}
class MyClass (is_immutable => 1) {}
No keyword. immutable should be the new default for the class keyword, old-style packages stay mutable.
Keyword discussion:
The type qualifier const, which creates CONST/CONSTPAD op and sets the SVf_READONLY flag can be represented either as new keyword "const", which looks most natural, but is hardest to parse. Larry opposed it initially, because it looked to C++ish. But nowadays it looks best.
The attribute it would be easiest to parse, as a MYTERM also parses and handles attributes, The MYTERM type just needs be extended for signatures. It also looks natural.
The perl6-like type trait is harder to parse, and a bit unnatural for lexicals.
The Moose style hash attributes only work for classes, not for lexicals and sigs.
A const isa is commonly known as "final" keyword. The class is not extendable, the compiler can do compile-time method resolution, i.e. convert a method to a function.
Pro: Compile-time method resolution
If the compiler knows at compile-time for each method, that all isa's until the method is found are const and also those classes are immutable (const), the method can be converted to a function. That would be a huge performance win, esp. with classes with favor methods over hash accessors.
Note that the accessor typo problem could also be solved with const hashes of the object representation, but nobody is using that yet. A const class (const %classname::) not, as this is independent of the underlying object representation, which is usually a blessed hash.
Function calls are slow, and method calls even 10% minimum slower. (10% for immediately found methods, for a deeper search the run-time costs are higher)
Contra:
I hope the "final" problem is known from java. Since the compiler needs to know in advance the inheritances it is not possible to extend and override methods of final classes. One cannot extend java strings. Thanks to Michael Schwern for the discussion.
Solutions:
-
(Reini): Define the following convention. No additional keywords needed. Libraries may use final, but finalization is defered until the application is processed, and all libraries (use statements) are already loaded. So mocking is still possible, but the default is to use compile-time method resolution. Schwern sees a problem in that scheme which I haven't understood yet.
-
(Larry): Libraries may use final, but the application with a
#pragma final
has the final word.
See also pddtypes.pod
Classes should default to immutable, packages keep the dynamic behaviour unless a package is declared as const. (Damian)
Some might know from Moose that immutable classes makes it 20x faster, even if not all possible optimizations are yet done.
They are already parsed for lexicals, just not for named arguments. The 3 coretypes int, num, string need to be reserved. p5-mop will probably define more. bool needs to be added probably also.
Type conventions in core are needed to
-
talk to other languages, like json, perl6 or java,
-
to specify the wanted behavior for methods acting on types, such as smartmatch or multi-methods, or
-
for special performance purposes, e.g. int loop counters, int arithmetic, smaller and faster typed arrays or hashes, or to enforce compile-time method resolution.
See pddtypes.pod and perltypes.pod I had an old version at my blog and at YAPC
An initial benefit would be natively typed arrays and hashes in core, with const hashes even optimizable hashes (so called "perfect hashes"). Further type checks and optimizers are left to modules.
Compile-time attribute hook for our three types to be able to use attributes for my declarations.
Note: Attributes still suffer from an over-architectured and broken Attribute::Handler implementation which evals the attribute value.
our $name:help(print the name);
will call eval "print the name";
Without fixing this, attributes will have no chance to be accepted. The syntax is nice, and it is already parsed.
The current prototype syntax explictly allows named arguments. There are several implementations already.
But there are several decisions required.
In order to optimize function and method calls, we need to define type qualifiers, and eventually return types, even if they are not used yet.
New syntax allows changing the semantics.
Lets follow perl6:
-
is bind (default) vs is copy (old semantics)
-
is ro (default) vs is rw (old semantics)
-
allow passing types and attributes to functions. attributes allow user-define hooks as now, just on function entries, not on variable declarations.
Optional arguments are defined by specifying defaults.
If we do not follow perl6 syntax with "is", we need attributes to specify ":rw" and possibly "$" to specify bind (by reference).
e.g. sub myadder (\$i, $num = 1) { $i =+ $num }
or sub myadder ($i:rw, $num = 1) { $i =+ $num }
bind ro is by far the fastest calling convention. optimizable and checkable by the compiler. copy is the safe way, rw uses the old $_[n] semantics.
I outlined my proposal in pddtypes.pod
Q: Do function args and return values keep constness?
A: Only function args by ref. This is current behaviour and makes sense.
Any optimizer needs to stop if a function return type is not known. We don't even know if any value is returned at all, so we have to check @_ at every LEAVESUB, though the parser knows the context information already. By optionally declaring return types, a type checker and optimizer can kick in. Esp. for coretypes like int, num, str or void or a const qualifier.
There exist old and wildly different syntaxes for return types, but they are unused. Use the perl6 syntax, which is c-like.
Q What about libraries declaring their return values constant? I cannot change them then and have to copy them?
A: No. Return values so far are not const. Only if you declare a function to return a const it will be so.
Calling a function via ENTERSUB and cleaning up at LEAVESUB is by far the slowest part of perl.
We can check our functions for the following situations: exceptions, jumps out, lexicals, locals, function calls, recursive calls.
If none of these occur, the function can be inlined.
We also need to check for tail calls and arguments. (signatures)
If no exceptions or no locals occur the parts in ENTERSUB and LEAVESUB which deal with that can be skipped.
We need to store the context and possible return type in ENTERSUB and LEAVESUB to speed up @_ handling.
We need to seperate XS calls from ENTERSUB.
Our current optree resolves op argument types (the compile-time op flags and also the POP'ed flags) at run-time. For the cases the op itself specifies the behavior or the argument type can be compile-time deferred (lvalue, context, magic, ...), an optimized op version should be used.
Promote type pessimization to all affected ops, and use optimized ops for non-pessimized. Similar to i_opt (integer constant folding) if all operands are non-magic IVs.
The biggest blocker are functions borders. Without named arguments passed as bind (alias), each function must optimize from scratch and looses all information.
See "PDD07 Compile-time entersub optimizations". entersub (and leavesub) needs to hold compiler information about the function, which requires waiting for parsing all embedded functions.
Even functions with arguments can be inlined, for safe versions with arguments by copy, and destructive arguments by bind. They just need a scope block.
We can easily change run-time method calls at compile-time to function calls. What is left is a decision on "PDD02 final classes - const @ISA" and "PDD03 immutable classes - const %class::"
Outlined here how-perl-calls-subs-and-methods and further refined at "Compile-time type optimizations" in perltypes
This just does method resolution (change to functions) and then does function inlining.
This is trivial as there are already isa change hooks. METHOD_NAMED and METHOD just need a check a global method or object cache.
multi needs types. (As smartmatch needs types to work reliably.)
As for the syntax multi can be implemented traditionally where the compiler generates the different methods per types automatically, or the perl6 way, with a seperate keyword. I see no problem with the first approach. This would need no new keyword.
The current MOP discussion and opinion is mainly about the new class and method keywords, but a MOP has nothing to do with that. Also not with Moose or a new object system. A MOP allows the definition of new behaviour for classes, methods, attributes, types, roles, inheritance and so on. How they are initialized, the layout, the behavior. A definition of alternate object systems. It is mainly proposed to overcome a Moose problem with anonymous packages, to seperate classes from stashes.
Introducing a MOP is good if the current object system is not good enough. The current object system is not good enough for Moose, and should be improved. There need to be two seperate discussions. One about what improvements Moose needs from the traditional stash based objects (global vs lexical namespaces - anon Packages), and the second about the MOP itself.
I have no opinion on the mop. Just this: Why bother with a mop before some basic langauge features are not yet decided upon? Moose does not even use types properly yet. This smells for premature hooks. But pmichaud is highly convinced that a p5 mop is a good thing.
Something like autovivification needs to get added to improve the optree. As shown in http://blogs.perl.org/users/rurban/2012/10/optimizing-compiler-benchmarks-part-3.html disabling vivification of arrays but also hashes will lead to compile-time optimizations and dramatic performance improvements, similar to const arrays or hashes, but even better.
autovivification is not usable for optimizations, as it is hard to detect, there's no API for an optimizer, and it needs to be in the compiler to be able to generate better code. Currently it adds checks, but does not improve the optree.
Similar to no vivify or const lexicals, a lexical 'no magic' pragma can lead to compile-time optimizations and dramatic performance improvements.
Slimmer nextstate op variants can be optimized at compile-time, which do not: reset PL_taint, the stack pointer and FREETMPS.
As shown in http://blogs.perl.org/users/rurban/2012/10/optimizing-compiler-benchmarks-part-2.html AELEMFAST is about 2 times faster than the generic AELEM, but it needs to know the index at compile-time. This is easy to do for loops.
Unroll loops with lots of AELEM into AELEMFAST accesses.
The worst part of perl is the parser. It is a hack, it is fast, but changing and esp. adding rules in a sane manner is hard, because the parser deviates in too many ways from a lexer/tokenizer seperation. For adding new syntax you usually cannot just add the syntax rules to perly.y
Second generating a traditional AST which generates a better optree (better optimizable, or emit jit or emit native code) is worthwile.
Our VM is a stack machine, which handles the stack on the heap. There are no typed alternatives.
There are integer optimized opts, but they are rarely used, "use integer" and "my int" can overcome this, but overflow behaviour needs to be defined. Either slow promotion to number or fast integer wrap, unsigned or signed. With "my int" this behaviour can be changed.
The VM is simple and easy to XS, but has major problems. An alternative VM could be based on parrot or vmkit or simply reuse the existing ops, with a different compiler and different stack handling.
A c-stack based compiler could arrange the optree as a natively compiled or jit'ed C program. Before each op call the op arguments (0-2 SV pointers) are put on the stack, lexicals also as in native closures, and functions are called natively via cdecl or stdcall, depending on if we need varargs.
By using LLVM even a register based (fastcall) layout can be arranged.
A jit could solve the run-time decisions for dynamic cases, which are not solvable at compile-time. But the vm should be JIT friendly. The current VM is quite jit-friendly, but the ops itself are too dynamic, there's not enough compile-time information in the ops, only in the operands at run-time. There need to be pre-compiled optimized alternatives for certain ops with known argument types.
To be practical I'm thinking of adding labels with a naming scheme to most ops, where a JIT or LLVM could hook into.
Just some random examples from pp.c, to give you an idea.
PP(pp_pos)
{
dVAR; dSP; dPOPss;
if (PL_op->op_flags & OPf_MOD || LVRET) {
pp_pos_mod:
SV * const ret = sv_2mortal(newSV_type(SVt_PVLV));
sv_magic(ret, NULL, PERL_MAGIC_pos, NULL, 0);
LvTYPE(ret) = '.';
LvTARG(ret) = SvREFCNT_inc_simple(sv);
PUSHs(ret); /* no SvSETMAGIC */
RETURN;
}
else {
if (SvTYPE(sv) >= SVt_PVMG && SvMAGIC(sv)) {
pp_pos_mg:
const MAGIC * const mg = mg_find(sv, PERL_MAGIC_regex_global);
if (mg && mg->mg_len >= 0) {
dTARGET;
I32 i = mg->mg_len;
if (DO_UTF8(sv))
sv_pos_b2u(sv, &i);
PUSHi(i);
RETURN;
}
}
RETPUSHUNDEF;
}
}
PP(pp_refgen)
{
dVAR; dSP; dMARK;
if (GIMME != G_ARRAY) {
pp_refgen_gimme_not_array:
if (++MARK <= SP)
*MARK = *SP;
else
*MARK = &PL_sv_undef;
*MARK = refto(*MARK);
SP = MARK;
RETURN;
}
pp_refgen_gimme_array:
EXTEND_MORTAL(SP - MARK);
while (++MARK <= SP)
*MARK = refto(*MARK);
RETURN;
}
Footnotes:
- "Ertl and Gregg analyze the performance of the following interpreters: Gforth, OCaml, Scheme48, Yap, Perl, Xlisp. While Gforth, OCaml, Scheme48 and Yap are categorized as efficient interpreters, Perl and Xlisp benchmarks are used for comparison purposes as inefficient interpreters.
While efficient interpreters perform with a slowdown by a factor of 10 when compared to an optimizing native code compiler, inefficient interpreters have a slowdown by a factor of 1000."
M. Anton Ertl and David Gregg. The structure and performance of efficient interpreters. Journal of Instruction-Level Parallelism, 5:125, November 2003. Cited on pages 6 and 7. https://students.ics.uci.edu/~sbruntha/cgi-bin/download.py?key=thesis