Registers | |
Caller-saved Callee-saved | |
RAX RCX RSP RDI RSI RDX R8 R9 R10 R11 RBP RBX R12 R13 R14 R15 | |
Args: RDI, RSI, RDX, RCX, R8, R9, XMM0–7 | |
Return: RAX | |
Simple Compile | |
yasm -f macho64 foo.asm && gcc foo.c foo.o -Wall -Wextra -g -O1 |
Why do compilers even bother with exploiting undefinedness signed overflow? And what are those | |
mysterious cases where it helps? | |
A lot of people (myself included) are against transforms that aggressively exploit undefined behavior, but | |
I think it's useful to know what compiler writers are accomplishing by this. | |
TL;DR: C doesn't work very well if int!=register width, but (for backwards compat) int is 32-bit on all | |
major 64-bit targets, and this causes quite hairy problems for code generation and optimization in some | |
fairly common cases. The signed overflow UB exploitation is an attempt to work around this. |
L1 cache reference ......................... 0.5 ns
Branch mispredict ............................ 5 ns
L2 cache reference ........................... 7 ns
Mutex lock/unlock ........................... 25 ns
Main memory reference ...................... 100 ns
Compress 1K bytes with Zippy ............. 3,000 ns = 3 µs
Send 2K bytes over 1 Gbps network ....... 20,000 ns = 20 µs
SSD random read ........................ 150,000 ns = 150 µs
Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs
#include <stdio.h> | |
#define STR2(x) #x | |
#define STR(x) STR2(x) | |
#define INCBIN(name, file) \ | |
__asm__(".section .rodata\n" \ | |
".global incbin_" STR(name) "_start\n" \ | |
".type incbin_" STR(name) "_start, @object\n" \ | |
".balign 16\n" \ |
#include <stdint.h> | |
/* | |
Fast 64bit integer log10 | |
WARNING: calling ilog10c(0) yields undefined behaviour! | |
On x64 this compiles down to: | |
pushq %rbp |
In a project I'm working on I ran into the requirement of having some sort of persistent FIFO buffer or pipe in Linux, i.e. something file-like that could accept writes from a process and persist it to disk until a second process reads (and acknowledges) it. The persistence should be both across process restarts as well as OS restarts.
AFAICT unfortunately in the Linux world such a primitive does not exist (named pipes/FIFOs do not persist
The default Go implementation of
sync.RWMutex does not scale well
to multiple cores, as all readers contend on the same memory location
when they all try to atomically increment it. This gist explores an
n
-way RWMutex, also known as a "big reader" lock, which gives each
CPU core its own RWMutex. Readers take only a read lock local to their
core, whereas writers must take all locks in order.
#Create bitbucket branch
##Create local branch
$ git checkout -b sync
Switched to a new branch 'sync'
$ git branch
master
* sync