Skip to content

Instantly share code, notes, and snippets.

@GoBigorGoHome
Last active November 5, 2024 12:52
Show Gist options
  • Select an option

  • Save GoBigorGoHome/4d0383d7f7f9e6c5cf6b6639a6495b8a to your computer and use it in GitHub Desktop.

Select an option

Save GoBigorGoHome/4d0383d7f7f9e6c5cf6b6639a6495b8a to your computer and use it in GitHub Desktop.
CS:APP3e notes
   _____   _____              _____   _____  
  / ____| / ____|_     /\    |  __ \ |  __ \ 
 | |     | (___ (_)   /  \   | |__) || |__) |
 | |      \___ \     / /\ \  |  ___/ |  ___/ 
 | |____  ____) |_  / ____ \ | |     | |     
  \_____||_____/(_)/_/    \_\|_|     |_|     
                                             
   _____   _____              _____   _____  
  / ____| / ____|_     /\    |  __ \ |  __ \ 
 | |     | (___ (_)   /  \   | |__) || |__) |
 | |      \___ \     / /\ \  |  ___/ |  ___/ 
 | |____  ____) |_  / ____ \ | |     | |     
  \_____||_____/(_)/_/    \_\|_|     |_|     
                                             

understand byte ordering

Byte ordering not bit ordering. Order of bits within a byte is same whether in big-endian or small endian.

0x200b43 in 00 20 0b 43 in big-endian, 43 0b 20 00 in little-endian. It seems how the bits are ordered within a bit is not specified.

two forms of right shift

  • logical right shift: fill the left end with zeros
  • arithmetic right shift: fill the left end with repetions of the most significant bit

why is “two's complement” so named

Note it wiritten as "two's complement" not "twos' complement"

The term arises from the fact that for nonnegative x we compute a w-bit representation of -x as 2^w - x.

Explanation

Let x be arbitrary nonnegative integer representable with w bit, i.e. x is in the range [0, 2^{w-1}-1].

Suppose -x has a w-bit representation as 1......., and suppose the trailing w-1 bits are the unsigned presentation of ineteger P.

Immediately, we have -x = -2^{w-1} + P, then x = 2^{w-1} - P. View the bit pattern of -x as unsigned, the value is 2^{w-1} + P, obviously, 2^{w-1} + P = 2^w - (2^{w-1} - P), that is T2U(-x) = 2^w - T2U(x), since x is nonnegative, T2U(x) is same as x.

effect of expanding or shrinking an encoded integer to fit a represent'n with a different length

Expanding is always possible. Zero expansion for two's-complement number.

Zero extension for unsigned number (this case is trivial).

relative order of conversion from one data size to another and between unsigned and signed

When converting short to an unsigned, the C standard requires that first short is converted to int (change the size) and then from int to unsigned.

How machine code does or does not distinguish between signed and unsigned values

Machine code does not associate a data type with each program value. It mostly uses the same instructions for the two cases, because many arithmetic operations have the same bit level behavior for unsigned and two's complement arithmetic. Some circumstances require different instructions to handle signed and unsigned operations, such as using different versions of right shifts, division and multiplication instructions, and different combinations of condition codes.

   _____   _____              _____   _____  
  / ____| / ____|_     /\    |  __ \ |  __ \ 
 | |     | (___ (_)   /  \   | |__) || |__) |
 | |      \___ \     / /\ \  |  ___/ |  ___/ 
 | |____  ____) |_  / ____ \ | |     | |     
  \_____||_____/(_)/_/    \_\|_|     |_|     
                                             

Ways to incorporate assembly code into C programs:

  1. Write entire functions in assembly code and combine them with C functions during the linking stage.

  2. Use gcc's support for embedding assembly code directly within C programs.

registers in an x86-64 CPU

different registers server different roles in typical programs

The most unique register is %rsp (stack pointer) indicate the end position in the run-time stack. The other 15 registers have more flexibility in their uses.

data movement instructions

movq vs movabsq

the movq instruction can only have immediate source operands that can be represented as 32-bits two's-complement numbers. This value is then sign extended to produce the 64-bit value for the destination. The movabsq instruction can have an arbirary 64-bit immediate value as its source operand and can only have a register as a destination.

movz and movs classes of instructions

Both classes of instructions can only have a register or memory location as the source and a register as the destination.

There is no movzlq because movl is eactaly what movzlq would have been.

why cltq is so named?

cltq means Convert Long To Quad.

The effect of cltq is to sign-extend %eax to %rax. It has the exact same effect as the instruction movslq %eax, %rax, but it has a more compact encoding.

Practice Problem 3.3

This problem asks one to explain what is wrong with each line of assembly code given.

(I include this into the note to show how I expressed the answers compared to how the authors (native speakers) had said the same thing (in italic style). This indicates how I can improve my English.)

code I say they say
movb $0xF, (%ebx) address must be stored in a 64-bit register Cannot use %ebx as address register
movl %rax, (%rsp) movl doesn't math %rax in length Mismatch between instruction suffix and register ID
movw (%rax), 4(%rsp) cannot move from one memory location to another Cannot have both source and destination be memory reference
movb %al, %sl I didn't find what's wrong:( No register named %sl
movl %eax, $0x123 an immediate cannot be the destination Cannot have immediate as destination
movl %eax, %dx the length of two registers not match Destination operand incorrect size
movl %si, 8(%rbp) movb and %si do not match in size Mismatch between instruction suffix and register ID

names of registers and their low-order byte in the original 8086

%ax %al %bx %bl %cx %cl %dx %dl

%si %sil %di %dil %bp %bpl %sp %spl

So it's strange that the low-order byte of %si is not named %sl since there is no register named %sx.

size of long in C

On page 177 (first pagraph of section 3.3), it is said

With x86-64, data type long is implemented with 64 bits, allowing a very wide range of values.

However, on my machine (MSYS2 port of GCC 8.3.0, 64 bit Windows), size of long is 4 bytes.

The code on CS:APP3e was built with GCC 4.8.1 (Ubuntu 4.8.1-2ubuntu~12.04)

So there is some difference between the assembly code given in the book and those given by my compiler.

comment in assembly code

use ";" to start a single line comment

Notes (and an issue) on practice problem 3.4

convert unsigned to long.

Solution from the book is

movzbl (%rdi), %eax
movq %rax, (%rsi)

However, logically the first instruction should be movzbq (%rdi), %rax. I am confused here. For this "issue", Bryant, coauthor of the book, made a clarification (not an erratum!) on the book's errata page and refer to the notes the note on Figure 3.5 (p. 184). From the notes:

... copy data from a source, which can be either a register or stored in memory, to a register destination. ... fill out the remaining bytes of the destination with zeros. ...the second specifying the destination size.

....the property that an instruction (e.g. movl) generating a 4 byte-value with a register as the destination will fill the upper 4 bytes with zeros.

From the author's clarification, it seems that we can infer that movzbl is also an instruction that generates a 4 byte-value. However, if this is true, there is no need to have movzbq and movzwq, as movzbl and movzwl do the same thing.

After all, it's only a guess, the authors didn't make the point.

But, a few more search guides me to a lecture handout made at 2005 by Bryant. From the handout:

... Perhaps unexpectedly, instructions that move or generate 32-bit register values also set the upper 32 bits of the register to zero. Consequently there is no need for an instruction movzlq. Similarly, the instruction movzbq has the exact same behavior as movzbl when the destination is a register—both set the upper 56 bits of the destination register to zero. This is in contrast to instructions that generate 8 or 16-bit values, such as movb; these instructions do not alter the other bits in the register.

In the book, however, the authors say nothing about the exact same behavior of movzbl and movzbq explicitly, leaving the reader guessing.

P.S. I find another evidence of the same effect of movzbq and movzbl in the book! On page 203, the forth line of the assembly code listing goes

movzbl %al, %eax  ; Clear rest of %eax (and rest of %rax)

note for this code listing says

Recall also, as discussed in Section 2.4.2, that the movzbl instruction (line 4) clears not just the high-order 3 bytes of %eax, but the upper 4 bytes of the entire register, %rax, as well.


I got wrong on two entries.

  • assign a char to an unsigned.

Remember that unsigned is short for unsigned int, so the answer should be

movsbl (%rdi), %eax
movl %eax, (%rsi)
  • assign a char to a short

The answer is

movsbw (%rdi), %ax
movw %ax, (%rsi)

arithmetic and logical operations

leaq: (calculate and) load effective address, the destination operand must be a register.

notes on leaq

From the book:

leaq has no other size variants.

in first paragraph of section 3.5.

Compling the function

long scale(long x, long y, long z) {
   long t = x * 4 * y + 12 * z;
   return t;
}

with command gcc -Og -S filename.c, gcc generates the following assembly code

leal	(%rcx,%rdx,4), %eax
leal	(%r8,%r8,2), %ecx
leal	0(,%rcx,4), %edx
addl	%edx, %eax
ret

Remember that on my laptop, sizeof(long) is 4. Obviouly, leal is a vriant of leaq. So the author might be wrong here?

The same code when compiled on Compiler Explorer with x86-64 gcc (trunk) leads to the exact same assembly as given on the book.

So I'll try more with Compiler Explorer.

Condition code

In addition to the integer registers, the CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation. These registers can then be tested to perform conditional branches.

An issue with the testing of unsigned comparisons

On page 204 the second paragraph, it says

in performing the computation t = a - b, the carry flag will be set by the CMP instruction when a - b < 0, ...

But I think the carry flag will be set by the CMP instruction when a - b >= 0. See https://stackoverflow.com/a/44623872/6052725 for a supporting evidence.

I am going to make an error report to the author.

是我搞错了。关于 SUB 指令是如何影响进位标志(carry flag, CF)的,书上并未说明。不过我在 StackOverflow 上找到一个提问,正是我想问的东西。 [https://stackoverflow.com/questions/45631132/setting-cf-while-cmp]

简言之,a - b 和 a + (-b) 结果相同,但是二者对 carry flag 的作用可能不同。

The result of a-b is the same as the result of a+(-b). However the flags set by the operation a-b are NOT (at least not always) the same flags as the flags set by the operation a+(-b)!

On the x86 the "carry flag" is defined as "borrow out" for subtractions.

Understanding how jump targets are encoded will become important when study linking.

interpret the output of a disassembler

program counter

why PC relative so named?

PC is short for Program Counter.

From the book:

That is, they (PC relative encodings) encode the difference between the address of the target instruction and the address of the instruction immediately following the jump.

..., the value of the program counter when performing PC-relative addressing is the address of the instruction following the jump, not that of the jump itself. This convention dates back to early implementations, when the processor would update the program counters as its first step in executing an instruction.

From Wikipedia:

The PC-relative addressing mode can be used to load a register with a value stored in program memory a short distance away from the current instruction. It can be seen as a special case of the "base plus offset" addressing mode, one that selects the program counter (PC) as the "base register".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment