duplicates = multiple editions
A Classical Introduction to Modern Number Theory, Kenneth Ireland Michael Rosen
A Classical Introduction to Modern Number Theory, Kenneth Ireland Michael Rosen
Interfaces naturally emerge as software gets broken down into parts communicating with one another. The larger and more deliberate structures emerge from a deliberate attempt to organize the development process itself. [fn:Liskov2008] Structure often emerge directly from division of labor: as teams take on independent tasks, interfaces are established betweeen domains they become responsible for. (Conway’s Law)
Software developers are responsible for systems built out of very small atoms while ultimately performing tasks for their users of a much greater magnitude. Dijkstra showed this by computing the ratio between grains of time at the lowest and largest atoms of the system (from say, CPU instructions to a human interaction with the system) The span was already quite large by Dijkstra’s time, of about 10^9. Today this ratio would be at least above 10^12 (see grain ratios)
This large span has to be manage
| // Xeno | |
| enum Xeno_Kind { | |
| XENO_POINTER, | |
| XENO_AGGREGATE, | |
| XENO_FIRST_PRIMITIVE_TYPE, | |
| XENO_UINT8 = XENO_FIRST_PRIMITIVE_TYPE, | |
| XENO_UINT16, | |
| XENO_UINT32, | |
| XENO_UINT64, |
| Why do compilers even bother with exploiting undefinedness signed overflow? And what are those | |
| mysterious cases where it helps? | |
| A lot of people (myself included) are against transforms that aggressively exploit undefined behavior, but | |
| I think it's useful to know what compiler writers are accomplishing by this. | |
| TL;DR: C doesn't work very well if int!=register width, but (for backwards compat) int is 32-bit on all | |
| major 64-bit targets, and this causes quite hairy problems for code generation and optimization in some | |
| fairly common cases. The signed overflow UB exploitation is an attempt to work around this. |
| nate@haswell:~/src$ likwid-perfctr -m -g UOPS_ISSUED_ANY:PMC0,UOPS_EXECUTED_CORE:PMC1,UOPS_RETIRED_ALL:PMC2,BR_INST_RETIRED_NEAR_TAKEN:PMC3 -C 1 fusion | |
| ------------------------------------------------------------- | |
| ------------------------------------------------------------- | |
| CPU type: Intel Core Haswell processor | |
| CPU clock: 3.39 GHz | |
| ------------------------------------------------------------- | |
| fusion | |
| two_micro_two_macro: sum1=10000000, sum2=9999999 | |
| one_micro_two_macro: sum1=10000000, sum2=9999999 | |
| one_micro_one_macro: sum1=10000000, sum2=9999999 |
| Not all of these events have been tested and they may be broken | |
| USE AT YOUR OWN RISK! | |
| CBO (Last Level Cache Slice) CACHE Events | |
| CBO.LLC_LOOKUP Cache Lookups | |
| CBO.LLC_LOOKUP.ANY Cache Lookups | |
| CBO.LLC_LOOKUP.DATA_READ Cache Lookups | |
| CBO.LLC_LOOKUP.NID Cache Lookups | |
| CBO.LLC_LOOKUP.READ Cache Lookups |
| #include <stdio.h> | |
| // #define CLANG_EXTENSION | |
| // Clang compile with -O3 | |
| #define VS_EXTENSION | |
| // https://godbolt.org/z/sVWrF4 | |
| // Clang compile with -O3 -fms-compatibility | |
| // VS2017 compile with /O3 |
FWIW: I (@rondy) am not the creator of the content shared here, which is an excerpt from Edmond Lau's book. I simply copied and pasted it from another location and saved it as a personal note, before it gained popularity on news.ycombinator.com. Unfortunately, I cannot recall the exact origin of the original source, nor was I able to find the author's name, so I am can't provide the appropriate credits.
| Latency Comparison Numbers | |
| -------------------------- | |
| L1 cache reference/hit 1.5 ns 4 cycles | |
| Floating-point add/mult/FMA operation 1.5 ns 4 cycles | |
| L2 cache reference/hit 5 ns 12 ~ 17 cycles | |
| Branch mispredict 6 ns 15 ~ 20 cycles | |
| L3 cache hit (unshared cache line) 16 ns 42 cycles | |
| L3 cache hit (shared line in another core) 25 ns 65 cycles | |
| Mutex lock/unlock 25 ns | |
| L3 cache hit (modified in another core) 29 ns 75 cycles |