Q1 (100 points): The average CPI (cycles per instruction) for different kinds of instructions for a computer is given in the table below, together with the frequency of each type of instruction for a given program P.
Operation | Frequency | CPI |
---|---|---|
Fixed-point ALU | 40% | 2 |
Floating point | 20% | 15 |
Load | 20% | 5 |
Store | 10% | 4 |
Branch | 10% | 3 |
Q1.a: Compute how much faster (as a percentage) the program would run if both the average CPI for load instructions were reduced to 4 and the average CPI for store instructions were reduced to 3. (Note: Compute a single answer for the combined reduction in CPI for both loads and stores; this part has only one case, not two separate cases.)
Q1.a (answer): To juxtapose the speed of the program execution (determined by complete number of cycles) we have to first calculate the total number of cycles used by the program P using the unaltered values. We do not have the total number of cycles used by P hence we solve,
If the load instruction has an average CPI of 4 and the store instruction has an average CPI of 3 we get that the total number of cycles performed by P is given by
P(2(0.4) + 15(0.2) + 4(0.2) + 3(0.1) + 3(0.1)) = P(24/10 + 152/10 + 4*2/10 + 3/10 + 3/10)
= P(8 + 30 + 8 + 3 + 3)/10 = 5.2P
Hence, the program would run at 5.2P/5.5P ~ 0.94 => a speed increase of ~6%
Q1.b (50 points): It is desired to make the program run at least 12% faster. Alternate implementations of the processor, with lower CPI for floating-point instructions, but the same CPI for all other instructions, are available at additional cost. The lower the CPI for floating-point instructions, the higher the cost, so to achieve a solution at minimum cost it is desired to find the highest (integer) CPI which is nevertheless low enough to achieve the desired increase in performance. Compute the maximum integer CPI for such an alternate processor which will guarantee the 12% speedup. In other words, compute the largest positive integer by which 15 may be replaced in the above table, while still achieving a 12% speedup. If no such positive integer exists, explain clearly why this is the case.
Q1.b (answer): To yield a 12% speed-up by using the highest integer CPI (for floating-point operations) that is nevertheless low enough to achieve the desired performance-increase for floating-point instructions we solve for x in the following equation:
P(2(0.4) + x(0.2) + 5(0.2) + 4(0.1) + 3(0.1)) = 5.5*(0.88)P
P(8 + 2x + 10 + 4 + 3)/10 = 5.5*(0.88)P
(2x + 25)/10 = 5.5*(0.88)
2x/10 + 2.5 = 5.5*(0.88)
2x/10 = 5.5*(0.88) - 2.5
x = 5(5.5*(0.88) - 2.5)
which yields that x = 11.7, hence the lowest integer which satisfies the performance requirement is 11.
Answer: x = 11.