Sometimes more code in a source language, can mean less code in a generated language.
NOTE: below we used
gcc 5.3with-std=c++14 -O3flags.
On the example of C programming language, and its translation to x64 assembly, let's take a look at the following example:
// 7 times `x`.
int foo(int x) {
return x + x + x + x + x + x + x;
}
int main() {
return foo(1);
}A generated code (applying an optimizing compiler) would look something like this:
foo(int):
leal (%rdi,%rdi,4), %eax
leal (%rax,%rdi,2), %eax
ret
main:
movl $7, %eax
retWhile if we add even more code in the source language, say 8 times for x:
// 8 times `x`.
int foo(int x) {
return x + x + x + x + x + x + x + x;
}
int main() {
return foo(1);
}We can get less generated code, taking just one instruction for the actual calculation:
foo(int):
leal 0(,%rdi,8), %eax
ret
main:
movl $8, %eax
retHow this happens and why, is related to the format of the effective address calculation arithmetics. And in fact, the lea instruction semantically is more related to exactly address manipulations (like array elements access, fields of objects, etc).
However, compilers often reuse lea to do simple arithmetic evaluations, although only those that fit the format of the lea (we covered such optimizations in this gist). You can also get additional info how this generic memory access format looks like here.
In case when the capabilities of lea are exceeded, compiler should fall-back to e.g. add instruction since the scale can only be 1, 2, 4 or 8. Having the code for 6 times x, we again get two instructions, but now already with less optimal addl instruction:
// 6 times `x`.
int foo(int x) {
return x + x + x + x + x + x;
}
int main() {
return foo(1);
}In assembly they are leal and addl:
foo(int):
leal (%rdi,%rdi,4), %eax
addl %edi, %eax
ret
main:
movl $6, %eax
retIn any case though, an optimizing compiler evaluated the whole resulting expression at static time, and the result from main function is a direct constant, instead of a runtime function call. Also notice, the optimizing compiler of course doesn't do a bunch of add instructions (6, 7 or 8 times), but instead as we said uses an "unrelated" lea instruction for this.
NOTE: try compiling without full optimization, i.e. excluding
-O3flag.
Have fun with compilation ;)
P.S.: And of course depends on a compiler version, different translation strategies can be chosen. E.g. that example with 7 times x instead of two leas can still be compiled with one, though less optimal instruction -- imul (actual multiplication). This is what we get in gcc 6:
foo(int):
imull $7, %edi, %eax
ret
main:
movl $7, %eax
retWritten by Dmitry Soshnikov.
Cool, thanks for the write up, I hadn't looked at the compiler's assembly output for ages.