Skip to content

Instantly share code, notes, and snippets.

@thestinger
Last active November 26, 2022 11:27
Show Gist options
  • Save thestinger/b43b460cfccfade51b5a2220a0550c35 to your computer and use it in GitHub Desktop.
Save thestinger/b43b460cfccfade51b5a2220a0550c35 to your computer and use it in GitHub Desktop.
Comparing ASLR between mainline Linux, grsecurity and linux-hardened

These results are with glibc malloc on x86_64. The last public PaX and grsecurity patches don't support arm64 which is one of the two architectures (x86_64 kernels including x32/x86_32 and arm64 kernels including armv7 userspace) focused on by linux-hardened. There isn't anything other than x86_64 to compare across all 3 kernels although linux-hardened has the same end result for both x86_64 and arm64 (with slightly different starting points) and there are few mainline differences. The linux-hardened implementation of ASLR is a very minimal modification of the mainline implementation to fix the weaknesses compared to grsecurity. The intention is to upstream all of these changes, although care needs to be taken to properly justify them to avoid getting anything rejected unnecessarily.

Explanation of differences between kernels:

  • Mainline and linux-hardened base randomization entropy for the mmap base and executable to the vm.mmap_rnd_bits sysctl for 64-bit and vm.mmap_rnd_compat_bits for 32-bit while it's hard-wired in grsecurity. Mainline uses the minimum supported value by default while linux-hardened uses the maximum. The maximum values for the sysctl on x86 match the entropy values used by grsecurity (16 bits on 32-bit, 32 bits on 64-bit).
  • The optional grsecurity / PaX UDEREF feature (software equivalent to SMAP for pre-Broadwell - also known as PAN in the ARM world) reduces the size of the userspace address space on x86_64 by 5 bits from 47 bits to 42 bits (it doesn't have this drawback in the 32-bit x86 and 32-bit arm implementations) which results in weaker ASLR. The relevant results for fair comparisons are with UDEREF disabled since it would have the same impact with other ASLR implementations.
  • grsecurity offsets the mmap base dynamically from the chosen random base for the stack rather than statically reserving space for the maximum amount. That results in one extra bit of measured entropy for the mmap base and one less bit of entropy for the gap between the stack and the mmap base, with increased code complexity.
  • Mainline has a subtly incorrect calculation for choosing the mmap base based on the maximum random stack offset, which is fixed in linux-hardened. That incorrect calculation is a blocker for improving stack mapping randomization there.
  • Mainline hard-wires the stack mapping entropy to 22 bits on x86_64 (11 bits for 32-bit) while linux-hardened also uses the vm.mmap_rand_bits / vm.mmap_rnd_compat_bits sysctl as appropriate for the stack mapping, matching the entropy provided by grsecurity by default.
  • The argument block and stack are both within the stack mapping. Mainline kernels don't randomize the lower bits of the argument block, only the stack, so that shows the entropy of the stack mapping itself. The reason for the 4 extra bits for grsecurity and linux-hardened for the argument block is the lower alignment requirement (1) compared to the stack (16). The one difference between PaX and grsecurity is that PaX uses 16 byte alignment for the argument block rather than 1. The lower bits for the argument block and stack are randomized entirely separately, so there's also randomization between them even with mainline since it randomizes one but not the other.
  • Mainline kernels use a strange randomization mechanism for the vdso placing it at a very low entropy offset from the stack randomization base. grsecurity and linux-hardened place it in the mmap region so it ends up with the dynamic linker and other shared objects. The vdso is essentially a shared object comparable to a subset of libc, so it makes sense for that to be where it's mapped rather than it being poorly randomized near the stack region which should have no executable code. Using the mmap region is actually more compact, which is the apparent justification for the mainline kernel decision. The vdso issue is specific to x86. On arm64, mainline already does the right thing.
  • The brk heap mapping is offset from the executable, so the mapping randomization only shows up in the tests for non-PIE but is still present for PIE. Mainline uses 13 bit mapping randomization while grsecurity uses 14 bit along with randomizing the lower bits. The lower bits are also randomized in linux-hardened via a cleaner mechanism not forcing the mapping to be present when unused, and it increases the mapping randomization on x86_64 to match arm64 by using 1G on 64-bit rather than 32M on both 32-bit and 64-bit.
  • grsecurity and linux-hardened guarantee that the brk gap is at least one page.
  • grsecurity reduces address space fragmentation via a lower base for the executable. A similar approach is used in linux-hardened but it doesn't currently differentiate between PIE and other ET_DYN and it uses a higher 4G as the base on 64-bit (note the address space is 128TiB on x86_64 so it doesn't matter) to avoid interfering with users of the low 4G like the Android Runtime. The brk heap grows up from the executable so this provides much more room to grow, although every sane allocator knows how to fall back to mmap if growth fails and modern allocators tend to completely avoid the legacy brk API.
  • grsecurity uses a weaker pax_get_random_long(...) random number generator based on prandom, dating back to before get_random_long(...) existed. It likely doesn't make much difference in practice, but it's significantly worse as a CSPRNG particularly since the move to SipHash-based get_random_int / get_random_long (as an optimization on top of the underlying ChaCha20-based CSPRNG) in very recent kernels.

Related changes, which are all in-scope for linux-hardened:

  • grsecurity also ignores mmap hints unless MAP_FIXED is passed with the intention of stopping code from shooting itself in the foot with fixed addresses. However, many uses of mmap hints are relative and don't bypass ASLR and ignoring the hints breaks codes in practice. This is the reason that PaX RANDMMAP exceptions are needed, which opts out of the entire PaX ASLR implementation even though there's never really a compatibility issue caused by anything but mmap hints being ignored. This feature might be implemented in linux-hardened, but it won't be tied to the rest of the ASLR improvements.
  • grsecurity has a related userspace exploit brute force protection feature to go along with the kernel equivalent (which is basically a friendlier panic_on_oops=1) which can make ASLR significantly harder to brute force
  • grsecurity has some features (under GRKERNSEC_PROC_MEMMAP) closing local address leaks and fixing setuid binary ASLR bypasses which help to make ASLR less bad as a local mitigation when combined with the brute force protection

The heap randomisation results can vary significantly based on the malloc implementation. Some examples:

  • Other dlmalloc-style allocators based on brk with mmap only as a fallback and for very large allocations are comparable to glibc (like musl).

  • For jemalloc (modern Android), mmap is used by default rather than brk. It uses 2M aligned regions as the low-level building block so it loses 9 bits of entropy with 4096 byte pages. For example, with 32-bit entropy for anonymous mappings, heap randomization entropy with jemalloc will be 23 bits. It will usually impact other mmap allocations too, although the impact is subtle since the kernel knows how to fill the randomly sized gaps between the 2M aligned jemalloc regions and whatever was above them. If the immediately following allocation can't fit there, it ends up aligned to 2M.

  • For OpenBSD malloc (CopperheadOS), mmap is used exclusively. However, it adds entropy rather than reducing it via a few forms of fine-grained randomization inside the allocator. It has a similar region based design as jemalloc with regions as a single page dedicated to a size class with out-of-line metadata instead of much more complex 2M aligned regions with metadata within them. The free slot randomization within slabs is picked up by paxtest as lower bit randomization within the alignment constraints of slab allocation. The malloc(100) call used by paxtest is rounded to 128 bytes. Slab allocations are naturally aligned so that means 128 byte alignment which is 5 extra bits of randomization compared to the anonymous mapping base so 37 bits with 32 bit anonymous mapping entropy. It's not a meaningful interpretation of the fine-grained randomization feature since it really isn't changing base randomization but rather adds different forms of randomization with related benefits. The free slot randomization is also likely the least useful of the 4 randomization features but it makes sense to do it since the bitmap-based slot metadata makes it easy / cheap while doing a better job than CONFIG_SLAB_FREELIST_RANDOM in the kernel's own slab allocators (SLAB and SLUB are actually far more similar to OpenBSD malloc than other userspace allocators, with the linear mapping struct page translation vs. OpenBSD malloc page span hash table, and freelists within a single page rather than an out-of-line bitmap per page).

Anonymous mapping randomisation test : 33 quality bits (guessed)
Heap randomisation test (ET_EXEC) : 22 quality bits (guessed)
Heap randomisation test (PIE) : 40 quality bits (guessed)
Main executable randomisation (ET_EXEC) : No randomization
Main executable randomisation (PIE) : 32 quality bits (guessed)
Shared library randomisation test : 33 quality bits (guessed)
VDSO randomisation test : 33 quality bits (guessed)
Stack randomisation test (SEGMEXEC) : 40 quality bits (guessed)
Stack randomisation test (PAGEEXEC) : 40 quality bits (guessed)
Arg/env randomisation test (SEGMEXEC) : 44 quality bits (guessed)
Arg/env randomisation test (PAGEEXEC) : 44 quality bits (guessed)
Offset to library randomisation (ET_EXEC): 33 quality bits (guessed)
Offset to library randomisation (ET_DYN) : 33 quality bits (guessed)
Randomization under memory exhaustion @~0: 33 bits (guessed)
Randomization under memory exhaustion @0 : 33 bits (guessed)
Anonymous mapping randomization test : 28 quality bits (guessed)
Heap randomization test (ET_EXEC) : 22 quality bits (guessed)
Heap randomization test (PIE) : 35 quality bits (guessed)
Main executable randomization (ET_EXEC) : No randomization
Main executable randomization (PIE) : 28 quality bits (guessed)
Shared library randomization test : 28 quality bits (guessed)
VDSO randomization test : 28 quality bits (guessed)
Stack randomization test (SEGMEXEC) : 35 quality bits (guessed)
Stack randomization test (PAGEEXEC) : 35 quality bits (guessed)
Arg/env randomization test (SEGMEXEC) : 39 quality bits (guessed)
Arg/env randomization test (PAGEEXEC) : 39 quality bits (guessed)
Offset to library randomisation (ET_EXEC): 28 quality bits (guessed)
Offset to library randomisation (ET_DYN) : 28 quality bits (guessed)
Randomization under memory exhaustion @~0: 28 bits (guessed)
Randomization under memory exhaustion @0 : 28 bits (guessed)
Anonymous mapping randomization test : 32 quality bits (guessed)
Heap randomization test (ET_EXEC) : 26 quality bits (guessed)
Heap randomization test (PIE) : 40 quality bits (guessed)
Main executable randomization (ET_EXEC) : No randomization
Main executable randomization (PIE) : 32 quality bits (guessed)
Shared library randomization test : 32 quality bits (guessed)
VDSO randomization test : 32 quality bits (guessed)
Stack randomization test (SEGMEXEC) : 40 quality bits (guessed)
Stack randomization test (PAGEEXEC) : 40 quality bits (guessed)
Arg/env randomization test (SEGMEXEC) : 44 quality bits (guessed)
Arg/env randomization test (PAGEEXEC) : 44 quality bits (guessed)
Offset to library randomisation (ET_EXEC): 32 quality bits (guessed)
Offset to library randomisation (ET_DYN) : 34 quality bits (guessed)
Randomization under memory exhaustion @~0: 32 bits (guessed)
Randomization under memory exhaustion @0 : 32 bits (guessed)
Anonymous mapping randomization test : 28 quality bits (guessed)
Heap randomization test (ET_EXEC) : 13 quality bits (guessed)
Heap randomization test (PIE) : 28 quality bits (guessed)
Main executable randomization (ET_EXEC) : No randomization
Main executable randomization (PIE) : 28 quality bits (guessed)
Shared library randomization test : 28 quality bits (guessed)
VDSO randomization test : 20 quality bits (guessed)
Stack randomization test (SEGMEXEC) : 30 quality bits (guessed)
Stack randomization test (PAGEEXEC) : 30 quality bits (guessed)
Arg/env randomization test (SEGMEXEC) : 22 quality bits (guessed)
Arg/env randomization test (PAGEEXEC) : 22 quality bits (guessed)
Offset to library randomisation (ET_EXEC): 28 quality bits (guessed)
Offset to library randomisation (ET_DYN) : 28 quality bits (guessed)
Randomization under memory exhaustion @~0: 28 bits (guessed)
Randomization under memory exhaustion @0 : 28 bits (guessed)
Anonymous mapping randomization test : 32 quality bits (guessed)
Heap randomization test (ET_EXEC) : 13 quality bits (guessed)
Heap randomization test (PIE) : 32 quality bits (guessed)
Main executable randomization (ET_EXEC) : No randomization
Main executable randomization (PIE) : 32 quality bits (guessed)
Shared library randomization test : 32 quality bits (guessed)
VDSO randomization test : 20 quality bits (guessed)
Stack randomization test (SEGMEXEC) : 30 quality bits (guessed)
Stack randomization test (PAGEEXEC) : 30 quality bits (guessed)
Arg/env randomization test (SEGMEXEC) : 22 quality bits (guessed)
Arg/env randomization test (PAGEEXEC) : 22 quality bits (guessed)
Offset to library randomisation (ET_EXEC): 32 quality bits (guessed)
Offset to library randomisation (ET_DYN) : 32 quality bits (guessed)
Randomization under memory exhaustion @~0: 32 bits (guessed)
Randomization under memory exhaustion @0 : 32 bits (guessed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment