Created
May 14, 2015 10:57
-
-
Save lpereira/ccbb99a0ed288a2e9487 to your computer and use it in GitHub Desktop.
Lwan template interpreter
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
New Old | |
movsx esi,BYTE PTR [r14+0x8] movsx esi,BYTE PTR [r13+0x8] | |
mov rdi,r12 mov rdi,rbp | |
add rbx,0x1 add r13,0x18 | |
call 0x412fd0 <strbuf_append_char> call 0x413ed0 <strbuf_append_char> | |
mov rax,QWORD PTR [r15+rbx*8] mov eax,DWORD PTR [r13+0x0] | |
jmp QWORD PTR [rax*8+0x416440] mov rax,QWORD PTR [rax*8+0x4166a0] | |
jmp rax | |
C: C: | |
pc++; chunk++; | |
goto *dispatch_table[ops[pc]]; goto *dispatch_table[chunk->action]; | |
`struct chunk` is 0x18 bytes. `ops` is an array of all the `action` fields from a `struct chunk` array, | |
tightly packed together. Code is pretty much equivalent, except that GCC had to load the next address | |
to `rax` and then jump there; it couldn't fuse the read with the indirect jump. Why? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@ricbit: I didn't measure; I've no idea if there's any difference, but I was puzzled anyway. The version on the left is 8 bytes shorter than the version on the right. Architecture is x86-64; my machine is a Sandy Bridge Core i7.
The C code for this snippet is here.