Created
December 30, 2010 05:21
-
-
Save rednaxelafx/759495 to your computer and use it in GitHub Desktop.
A code snippet to show some relationship between JVM/HotSpot's and Dalvik's interpreter.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Java source code: | |
k = i + j; | |
May compile to Java bytecode: | |
iload_0 | |
iload_1 | |
iadd | |
istore_2 | |
And may turn into Dalvik VM code: | |
add-int v2, v1, v0 | |
Compare HotSpot Client VM's interpreter in JDK6u18 with Dalvik's interpreter in Android 2.0, on x86. | |
To execute the program above, the code traces from unrolling the intepreters' fetch-dispatch-execute loop, | |
are: | |
HotSpot's interpreter (client mode default config): | |
;;-------------iload_0------------- | |
mov eax, dword ptr [edi] | |
movzx ebx, byte ptr [esi + 1] | |
inc esi | |
jmp dword ptr [ebx*4 + 6DB188C8] | |
;;-------------iload_1------------- | |
push eax | |
mov eax, dword ptr [edi-4] | |
movzx ebx, byte ptr [esi+1] | |
inc esi | |
jmp dword ptr [ebx*4 + 6DB188C8] | |
;;--------------iadd--------------- | |
pop edx | |
add eax, edx | |
movzx ebx, byte ptr [esi + 1] | |
inc esi | |
jmp dword ptr [ebx*4 + 6DB188C8] | |
;;------------istore_2------------- | |
mov dword ptr [edi-8],eax | |
movzx ebx,byte ptr [esi+1] | |
inc esi | |
jmp dword ptr [ebx*4 + 6DB19CC8] | |
Dalvik's interpreter: | |
;;------------add-int-------------- | |
movzx eax, byte ptr [edx + 2] | |
movzx ecx, byte ptr [edx + 3] | |
mov eax, dword ptr [esi + eax*4] | |
add eax, dword ptr [esi + ecx*4] | |
movzx ecx, bh | |
movzx ebx, word ptr [edx + 4] | |
lea edx, dword ptr [edx + 4] | |
mov dword ptr [esi + ecx*4], eax | |
movzx eax, bl ; GOTO_NEXT "computed next" version | |
sal eax, $$$handler_size_bits | |
add eax, edi | |
jmp eax | |
If we strip off the fetch/dispatch part from the two code traces above, we'll get: | |
HotSpot: | |
;;-------------iload_0------------- | |
mov eax, dword ptr [edi] | |
;;-------------iload_1------------- | |
push eax | |
mov eax, dword ptr [edi - 4] | |
;;--------------iadd--------------- | |
pop edx | |
add eax, edx | |
;;------------istore_2------------- | |
mov dword ptr [edi - 8], eax | |
Dalvik: | |
;;------------add-int-------------- | |
movzx eax, byte ptr [edx + 2] | |
movzx ecx, byte ptr [edx + 3] | |
mov eax, dword ptr [esi + 4*eax] | |
add eax, dword ptr [esi + 4*ecx] | |
movzx ecx, bh | |
mov dword ptr [esi + 4*ecx], eax | |
Now we can see that in this example, counting the number of instruction that actually executes user code's | |
original semantics, both HotSpot's and Dalvik's interpreter uses 6 x86 instructions. | |
Which means, HotSpot doesn't lose performance in the "execution" part just because the JVM spec defined a | |
stack-based instruction set. By using 1-top-of-stack caching, HotSpot can still make efficient use of machine | |
registers during interpretation, in spite of the fact it's emulating a stack-based abstract machine. | |
On the other hand, Dalvik's interpreter (on x86) stores all of its "virtual registers" on the stack frames, | |
which is in memory, which is in turn slower to access than HotSpot's TOS (top-of-stack) value. Of course, | |
Dalvik can further tune the interpreter to try and squeeze even more performance out, but due to the scarce | |
number of registers available on x86, it's going to be pretty hard. It'll be easier if there are more free | |
registers, like x86-64 or some RISC processor. | |
But because JVM has to use more number of bytecode instructions than Dalvik to do the same work, the "fetch- | |
dispatch" part makes HotSpot's interpreter have to pay more interpretation overhead than Dalvik's. | |
------------------------------------------------------------------------------------------------ | |
It's interesting if we look at Sun JDK 1.1.8's interpreter. To run the example shown above, and again count- | |
ing just the "execution" part, we'd get: | |
;;-------------iload_0------------- | |
mov ebx, dword ptr [ebp] | |
;;-------------iload_1------------- | |
mov ecx, dword ptr [ebp + 4] | |
;;--------------iadd--------------- | |
add ebx, ecx | |
;;------------istore_2------------- | |
mov dword ptr [ebp + 8], ebx | |
That's 2 memory reads and 1 memory write, exactly what you'd get were the example written in C and compiled | |
without optimization, which is not bad for an interpreter. This is also the effect of multi-state top-of- | |
stack caching. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment