The program threads-memory.c
(included below) starts 100 threads, allocates 1
MB of memory in each, and then pauses. How much memory is it using?
Let's find out by running it:
$ gcc -pthread threads-memory.c -o threads-memory
$ ./threads-memory
starting threads... done
press enter to exit
While it's still running, check the RSS (resident set size) using ps
. On my
Linux system, the result is:
$ ps -o rss -C threads-memory
RSS
81084
It's only using 81 MB! How could it possibly be using less than 100 MB?
Starting in Linux 2.6.34, the value reported by ps
is an approximation:
For making accounting scalable, RSS related information are handled in asynchronous manner and the vaule [sic] may not be very precise. To see a precise snapshot of a moment, you can see
/proc/<pid>/smaps
file and scan page table. It's slow but very precise.
Let's try to understand this change.
In Linux, threads are just processes that happen to share the same address
space (memory). The struct task_struct
represents a process, and the struct
mm_struct
represents an address space. mm_struct
contains a counter
tracking the RSS. This is the value used by ps
.
Having every thread access the same mm_struct
every time memory is allocated
would be inefficient. The optimization adds a per-thread cache for the counter
in task_struct
. Each cache is flushed to the associated mm_struct
once
every 64 page faults in a thread. Assuming a 4 KB page size, this means that up
to 252 KB (64 * 4 KB
) may be unaccounted for. Probably not a big deal, unless
you're running a lot of threads!
To get a precise RSS value, you can use the pmap
command instead, which scans
the page table instead of using the RSS counter:
$ pmap -x $(pidof threads-memory) | grep -E "Address|total"
Address Kbytes RSS Dirty Mode Mapping
total kB 2960712 105424 103888