Skip to content

Instantly share code, notes, and snippets.

@navyxliu
Created October 19, 2022 21:24
Show Gist options
  • Save navyxliu/9c325d5c445899c02a0d115c6ca90a79 to your computer and use it in GitHub Desktop.
Save navyxliu/9c325d5c445899c02a0d115c6ca90a79 to your computer and use it in GitHub Desktop.
PEA_C2_Example1
// -Xcomp -Xms16M -Xmx16M -XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -XX:-UseOnStackReplacement -XX:CompileOnly='Example1.ivanov' -XX:CompileCommand=dontinline,Example1.blackhole
class Example1 {
private Object _cache;
public void foo(boolean cond) {
Object x = new Object();
if (cond) {
_cache = x;
}
}
// Ivanov suggest to make this happen first.
// we don't need to create JVMState for the cloning Allocate.
public void ivanov(boolean cond) {
Object x = new Object();
if (cond) {
blackhole(x);
}
}
static void blackhole(Object x) {}
public void test1(boolean cond) {
//foo(cond);
ivanov(cond);
}
public static void main(String[] args) {
Example1 kase = new Example1();
// Epsilon Test:
// By setting the maximal heap and use EpsilonGC, let's see how long and how many iterations the program can sustain.
// if PEA manages to reduce allocation rate, we expect the program to stay longer.
// Roman commented it with a resonable doubt: "or your code slow down the program..."
// That's why I suggest to observe iterations. It turns out not trivial because inner OOME will implode hotspot. We don't have a chance to execute the final statement...
long iterations = 0;
try {
while (true) {
kase.test1(0 == (iterations & 0xf));
iterations++;
}
} finally {
System.err.println("Epsilon Test: " + iterations);
}
}
}
@navyxliu
Copy link
Author

before:

+ ../build/linux-x86_64-server-fastdebug/images/jdk/bin/java -Xcomp -Xms32M -Xmx32M -XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -XX:-UseOnStackReplacement -XX:+UseTLAB -XX:CompileOnly=Example1.ivanov -XX:CompileCommand=dontinline,Example1.blackhole -Xlog:gc -XX:+Verbose -XX:-DoPartialEscapeAnalysis -XX:CompileCommand=IGVPrintLevel,Example1.ivanov,-1 Example1
[0.028s][info][gc] Using Epsilon
CompileCommand: dontinline Example1.blackhole bool dontinline = true
CompileCommand: IGVPrintLevel Example1.ivanov intx IGVPrintLevel = -1
Example1.ivanov
CompileCommand: compileonly Example1.ivanov bool compileonly = true
[0.123s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 1640K (5.00%) used
[0.132s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 3504K (10.70%) used
[0.142s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 5529K (16.88%) used
[0.153s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 7687K (23.46%) used
[0.163s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 9678K (29.54%) used
[0.174s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 11645K (35.54%) used
[0.185s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 13611K (41.54%) used
[0.195s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 15577K (47.54%) used
[0.206s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 17543K (53.54%) used
[0.216s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 19509K (59.54%) used
[0.227s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 21475K (65.54%) used
[0.237s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 23441K (71.54%) used
[0.248s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 25407K (77.54%) used
[0.258s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 27374K (83.54%) used
[0.268s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 29340K (89.54%) used
[0.279s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 31306K (95.54%) used
Terminating due to java.lang.OutOfMemoryError: Java heap space

after:

+ ../build/linux-x86_64-server-fastdebug/images/jdk/bin/java -Xcomp -Xms32M -Xmx32M -XX:+AlwaysPreTouch -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -XX:-UseOnStackReplacement -XX:+UseTLAB -XX:CompileOnly=Example1.ivanov -XX:CompileCommand=dontinline,Example1.blackhole -Xlog:gc -XX:+Verbose -XX:+DoPartialEscapeAnalysis -XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations -XX:CompileCommand=IGVPrintLevel,Example1.ivanov,-1 Example1
[0.025s][info][gc] Using Epsilon
CompileCommand: dontinline Example1.blackhole bool dontinline = true
CompileCommand: IGVPrintLevel Example1.ivanov intx IGVPrintLevel = -1
Example1.ivanov
CompileCommand: compileonly Example1.ivanov bool compileonly = true
PEA materializes a virtual object:  26  Allocate  === 5 6 7 8 1 (24 22 23 1 1 1 11 1 ) [[ 27 28 29 36 37 38 ]]  rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Example1::ivanov @ bci:0 (line 15)  Type:{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:rawptr:NotNull} !jvms: Example1::ivanov @ bci:0 (line 15)

======== Connection graph for  Example1::ivanov
invocation #0: 2 iterations and 0.000000 sec to build connection graph with 142 nodes and worklist size 11

JavaObject(3) NoEscape(NoEscape) [ [ 38 ]]    26  Allocate  === 5 6 7 8 1 (24 22 23 1 1 1 11 1 ) [[ 27 28 29 36 37 38 ]]  rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Example1::ivanov @ bci:0 (line 15)  Type:{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:rawptr:NotNull} !jvms: Example1::ivanov @ bci:0 (line 15)
LocalVar(6) NoEscape(NoEscape) [ 26P [ ]]    38  Proj  === 26  [[ 39 ]] #5  Type:rawptr:NotNull !jvms: Example1::ivanov @ bci:0 (line 15)

JavaObject(4) ArgEscape(ArgEscape) [ [ 96 101 ]]    84  Allocate  === 78 6 7 8 1 (24 22 23 1 1 1 1 1 ) [[ 85 86 87 94 95 96 ]]  rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Example1::ivanov @ bci:13 (line 18)  Type:{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:rawptr:NotNull} !jvms: Example1::ivanov @ bci:13 (line 18)
LocalVar(7) ArgEscape(ArgEscape) [ 84P [ 101 ]]    96  Proj  === 84  [[ 97 101 ]] #5  Type:rawptr:NotNull !jvms: Example1::ivanov @ bci:13 (line 18)
LocalVar(8) ArgEscape(ArgEscape) [ 96 84P [ ]]   101  CheckCastPP  === 98 96  [[ 102 ]]   Oop:java/lang/Object:NotNull:exact * !jvms: Example1::ivanov @ bci:13 (line 18)

Scalar   26  Allocate  === 5 6 7 8 1 (24 22 23 1 1 1 11 1 ) [[ 27 28 29 36 37 38 ]]  rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Example1::ivanov @ bci:0 (line 15)  Type:{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:rawptr:NotNull} !jvms: Example1::ivanov @ bci:0 (line 15)
++++ Eliminated: 26 Allocate

======== Connection graph for  Example1::ivanov
invocation #1: 2 iterations and 0.000007 sec to build connection graph with 142 nodes and worklist size 9

JavaObject(3) ArgEscape(ArgEscape) [ [ 96 101 ]]    84  Allocate  === 78 6 7 8 1 (24 22 23 1 1 1 1 1 ) [[ 85 86 87 94 95 96 ]]  rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Example1::ivanov @ bci:13 (line 18)  Type:{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:rawptr:NotNull} !jvms: Example1::ivanov @ bci:13 (line 18)
LocalVar(5) ArgEscape(ArgEscape) [ 84P [ 101 ]]    96  Proj  === 84  [[ 97 101 ]] #5  Type:rawptr:NotNull !jvms: Example1::ivanov @ bci:13 (line 18)
LocalVar(6) ArgEscape(ArgEscape) [ 96 84P [ ]]   101  CheckCastPP  === 98 96  [[ 102 ]]   Oop:java/lang/Object:NotNull:exact * !jvms: Example1::ivanov @ bci:13 (line 18)

=== No allocations eliminated for  Example1::ivanov since there are no scalar replaceable candidates ===
[0.145s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 1640K (5.00%) used
[0.283s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 3504K (10.70%) used
[0.433s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 5529K (16.88%) used
[0.593s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 7687K (23.46%) used
[0.753s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 9678K (29.54%) used
[0.913s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 11645K (35.54%) used
[1.074s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 13611K (41.54%) used
[1.235s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 15577K (47.54%) used
[1.393s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 17543K (53.54%) used
[1.552s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 19509K (59.54%) used
[1.711s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 21475K (65.54%) used
[1.870s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 23441K (71.54%) used
[2.029s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 25407K (77.54%) used
[2.188s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 27374K (83.54%) used
[2.346s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 29340K (89.54%) used
[2.505s][info][gc] Heap: 32768K reserved, 32768K (100.00%) committed, 31306K (95.54%) used
Terminating due to java.lang.OutOfMemoryError: Java heap space

@navyxliu
Copy link
Author

The most obvious result is that program with PEA sustain longer. With 16M heap, it sustains 2.5s while the original program only stays 0.279s.

this change amounts to change source to

    public void ivanov(boolean cond) {
        Object x = null;

        if (cond) {
            blackhole(new Object());
        }
    }

    static void blackhole(Object x) {}

@navyxliu
Copy link
Author

The is the generated code (-XX:+PrintOptoAssembly) for Example1::ivanov with DoPartialEscapeAnalysis.

please note that object allocation(B6) and initialization(B4) are both subject to
'testl RDX, RDX' on B1.

============================= C2-compiled nmethod ==============================
#r018 rsi:rsi   : parm 0: Example1:NotNull *
#r016 rdx   : parm 1: int
# -- Old rsp -- Framesize: 32 --
#r591 rsp+28: in_preserve
#r590 rsp+24: return address
#r589 rsp+20: in_preserve
#r588 rsp+16: saved fp register
#r587 rsp+12: pad2, stack alignment
#r586 rsp+ 8: pad2, stack alignment
#r585 rsp+ 4: Fixed slot 1
#r584 rsp+ 0: Fixed slot 0
#
----------------------- MetaData before Compile_id = 2 ------------------------
{method}
 - this oop:          0x00007fbdd9000540
 - method holder:     synchronized 'Example1'
 - constants:         0x00007fbdd9000050 constant pool [70]/operands[5] {0x00007fbdd9000050} for synchronized 'Example1' cache=0x00007fbdd9000870
 - access:            0x81000001  public 
 - name:              'ivanov'
 - signature:         '(Z)V'
 - max stack:         3
 - max locals:        3
 - size of params:    2
 - method size:       14
 - highest level:     3
 - vtable index:      6
 - i2i entry:         0x00007fbde419df00
 - adapters:          AHE@0x00007fbdf40d3240: 0xba i2c: 0x00007fbde42adbe0 c2i: 0x00007fbde42adc9a c2iUV: 0x00007fbde42adc68 c2iNCI: 0x00007fbde42adcd4
 - compiled entry     0x00007fbddcd681e0
 - code size:         17
 - code start:        0x00007fbdd9000528
 - code end (excl):   0x00007fbdd9000539
 - method data:       0x00007fbdd9000ad0
 - checked ex length: 0
 - linenumber start:  0x00007fbdd9000539
   - line 15: 0
   - line 17: 8
   - line 18: 12
   - line 20: 16
 - localvar length:   0
 - compiled code: nmethod    122    1       3       Example1::ivanov (17 bytes)

------------------------ OptoAssembly for Compile_id = 2 -----------------------
#
#  void ( Example1:NotNull *, int )
#
000     N94: #	out( B1 ) <- BLOCK HEAD IS JUNK  Freq: 1 IDom: 0/#1 RegPressure: 0 IHRP Index: 1 FRegPressure: 0 FHRP Index: 1
000     movl    rscratch1, [j_rarg0 + oopDesc::klass_offset_in_bytes()]	# compressed klass
	decode_klass_not_null rscratch1, rscratch1
	cmpq    rax, rscratch1	 # Inline cache check
	jne     SharedRuntime::_ic_miss_stub
	nop	# nops to align entry point

        nop 	# 4 bytes pad for loops and calls

020     B1: #	out( B5 (B12) B2 ) <- BLOCK HEAD IS JUNK  Freq: 1 IDom: 0/#2 RegPressure: 1 IHRP Index: 10 FRegPressure: 0 FHRP Index: 10
020     # stack bang (136 bytes)
	pushq   rbp	# Save rbp
	subq    rsp, #16	# Create frame

03a     testl   RDX, RDX
03c     je,s   B5  P=0.100000 C=-1.000000

03e     B2: #	out( B6 B3 ) <- in( B1 )  Freq: 0.9 IDom: 1/#3 RegPressure: 2 IHRP Index: 8 FRegPressure: 0 FHRP Index: 8
03e     # TLS is in R15
03e     movq    RSI, [R15 + #264 (32-bit)]	# ptr
045     movq    R10, RSI	# spill
048     addq    R10, #16	# ptr
04c     cmpq    R10, [R15 + #280 (32-bit)]	# raw ptr
053     jae,us  B6  P=0.000100 C=-1.000000

055     B3: #	out( B4 ) <- in( B2 )  Freq: 0.89991 IDom: 2/#4 RegPressure: 2 IHRP Index: 7 FRegPressure: 0 FHRP Index: 7
055     movq    [R15 + #264 (32-bit)], R10	# ptr
05c     PREFETCHNTA [R10 + #192 (32-bit)]	# Prefetch allocation to non-temporal cache for write
064     movq    [RSI], #1	# long
06b     movl    [RSI + #8 (8-bit)], narrowklass: precise java/lang/Object: 0x00007fbdb000fd90:Constant:exact *	# compressed klass ptr
072     movl    [RSI + #12 (8-bit)], R12	# int (R12_heapbase==0)

076     B4: #	out( B9 B5 (B11) ) <- in( B7 B3 )  Freq: 0.9 IDom: 2/#4 RegPressure: 12 IHRP Index: 19 FRegPressure: 32 FHRP Index: 12
076     
076     MEMBAR-storestore (empty encoding)
076     # checkcastPP of RSI
        nop 	# 1 bytes pad for loops and calls
077     call,static  Example1::blackhole
        # Example1::ivanov @ bci:13 (line 18) L[0]=_ L[1]=_ L[2]=_
        # OopMap {off=124/0x7c}

07c     B5: #	out( N94 ) <- in( B4 (B11) B1 (B12) )  Freq: 0.999982 IDom: 1/#3 RegPressure: 0 IHRP Index: 4 FRegPressure: 0 FHRP Index: 4
07c     addq    rsp, 16	# Destroy frame
	popq    rbp
	cmpq     rsp, poll_offset[r15_thread] 
	ja       #safepoint_stub	# Safepoint: poll for GC

08e     ret

08f     B6: #	out( B8 B7 ) <- in( B2 )  Freq: 9.00149e-05 IDom: 2/#4 RegPressure: 12 IHRP Index: 10 FRegPressure: 32 FHRP Index: 2
08f     movq    RSI, precise java/lang/Object: 0x00007fbdb000fd90:Constant:exact *	# ptr
        nop 	# 2 bytes pad for loops and calls
09b     call,static  wrapper for: _new_instance_Java
        # Example1::ivanov @ bci:13 (line 18) L[0]=_ L[1]=_ L[2]=_
        # OopMap {off=160/0xa0}

0a0     B7: #	out( B4 ) <- in( B6 )  Freq: 9.00131e-05 IDom: 6/#5 RegPressure: 1 IHRP Index: 3 FRegPressure: 0 FHRP Index: 3
        # Block is sole successor of call
0a0     movq    RSI, RAX	# spill
0a3     jmp,s   B4

0a5     B8: #	out( B10 ) <- in( B6 )  Freq: 9.00149e-10 IDom: 6/#5 RegPressure: 1 IHRP Index: 4 FRegPressure: 0 FHRP Index: 4
0a5     # exception oop is in rax; no code emitted
0a5     movq    RSI, RAX	# spill
0a8     jmp,s   B10

0aa     B9: #	out( B10 ) <- in( B4 )  Freq: 9e-06 IDom: 4/#5 RegPressure: 1 IHRP Index: 4 FRegPressure: 0 FHRP Index: 4
0aa     # exception oop is in rax; no code emitted
0aa     movq    RSI, RAX	# spill

0ad     B10: #	out( N94 ) <- in( B9 B8 )  Freq: 9.0009e-06 IDom: 2/#4 RegPressure: 1 IHRP Index: 7 FRegPressure: 0 FHRP Index: 7
0ad     addq    rsp, 16	# Destroy frame
	popq    rbp

0b2     jmp     rethrow_stub

0b7     B11: #	out( B5 ) <- in( B4 )  Freq: 0.899982 IDom: 4/#5 RegPressure: 0 IHRP Index: 2 FRegPressure: 0 FHRP Index: 2
        # Empty connector block

0b7     B12: #	out( B5 ) <- in( B1 )  Freq: 0.1 IDom: 1/#3 RegPressure: 0 IHRP Index: 2 FRegPressure: 0 FHRP Index: 2
        # Empty connector block

--------------------------------------------------------------------------------

@navyxliu
Copy link
Author

navyxliu commented Oct 19, 2022

This the IR after parse, without PEA.
Example1_ivanov_after_parse

Here is the IR after parse with PEA. Please note that we clone the object. It's a cluster of nodes. 101 CheckCastPP is the argument of 102 CallStaticJava/Example1::blackhole. It's hid by the filter 'Simplify graph' of IGV.

Example1_ivanov_PEA_after_parse

Even though it looks like we leave behind a redundancy, C2 EA/SR will eliminate the obsolete 26 Allocate! Here is the IR after Iterative EA. It is as if our PEA optimization moves the object under IfTrue.
Example1_ivanov_PEA_after_iterEA

@navyxliu
Copy link
Author

the source code of my experimental PEA: https://github.com/navyxliu/jdk/tree/PEA_parser

@merykitty
Copy link

I believe that JMH has prof:gc to acquire the allocation rate per iteration, maybe you can somehow use it for allocation rate of PEA. Thanks.

@navyxliu
Copy link
Author

I believe that JMH has prof:gc to acquire the allocation rate per iteration, maybe you can somehow use it for allocation rate of PEA. Thanks.

yes, I will convert this to a JMH. thanks!

@navyxliu
Copy link
Author

This is a developing story.
if you are still interested, please follow it up -> Example2

@navyxliu
Copy link
Author

The following code creates a phi node to merge 2 objects.
I believe RAM of JDK-8289943 can make the NonEscape object be replace in

    public Object merge_node(boolean cond) {
        Object x = new Object();

        if (cond) {
            _cache = x;
        }
        return x;
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment