The Linux kernel provides a cryptographically-secure random number generator, via the interface getrandom(), or the device file /dev/urandom.
However, performance will hurt if we make a getrandom() syscall every time we need some random bytes.
Recently, Linux introduced an alternative way to get random bytes through the vDSO (Virtual Dynamic Shared Object) mechanism.
Essentially, the kernel maps a pseudorandom generator into the image of every executable.
Now instead of making a getrandom() syscall, we call into this generator function,
which periodically reseeds itself via getrandom(), but otherwise works purely in userspace.
In this blogpost, I document how to find the entry point of the userspace getrandom() function, and how to call this function.
When the kernel launches an executable, it passes some environment information to the executable through the stack.
The most well-known fields of the initial stack are argc (number of arguments) and argv (array of argument strings).
However, there are several more fields.
The end of argv always contains 8 bytes of NUL.
After the argv array comes the envp array, which is the list of environment variables.
Unlike the argv array, there is no field that indicates the number of entries in the envp array.
The end of envp also contains 8 bytes of NUL.
After the envp array comes the auxv array, which stands for "auxiliary vector."
Each entry of auxv contains two 8-byte fields, which are "type" and "value."
On AArch64, a typical auxv array appears as follows:
0x7ffffff5d8: 0x21 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff5e0: 0x00 0xf0 0xff 0xf7 0x7f 0x00 0x00 0x00
0x7ffffff5e8: 0x33 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff5f0: 0x70 0x12 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff5f8: 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff600: 0xff 0x08 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff608: 0x06 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff610: 0x00 0x10 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff618: 0x11 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff620: 0x64 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff628: 0x03 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff630: 0x40 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff638: 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff640: 0x38 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff648: 0x05 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff650: 0x03 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff658: 0x07 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff660: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff668: 0x08 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff670: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff678: 0x09 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff680: 0xe8 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff688: 0x0b 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff690: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff698: 0x0c 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6a0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6a8: 0x0d 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6b0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6b8: 0x0e 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6c0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6c8: 0x17 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6d0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6d8: 0x19 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6e0: 0x28 0xf7 0xff 0xff 0x7f 0x00 0x00 0x00
0x7ffffff6e8: 0x1a 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6f0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6f8: 0x1f 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff700: 0xd9 0xff 0xff 0xff 0x7f 0x00 0x00 0x00
0x7ffffff708: 0x0f 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff710: 0x38 0xf7 0xff 0xff 0x7f 0x00 0x00 0x00
0x7ffffff718: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff720: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
The auxv array ends with 16 bytes of NUL.
We document here the purpose of every entry in auxv.
0x7ffffff5d8: 0x21 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff5e0: 0x00 0xf0 0xff 0xf7 0x7f 0x00 0x00 0x00
Type 0x21 is AT_SYSINFO_EHDR. This entry is present on every platform that supports vDSO.
The address 0x7ff7fff000 is the start address of the vDSO object.
We can verify this by printing the first bytes at 0x7ff7fff000, which should show an ELF header 0x7f 0x45 0x4c 0x46.
0x7ffffff5e8: 0x33 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff5f0: 0x70 0x12 0x00 0x00 0x00 0x00 0x00 0x00
Type 0x33 is AT_MINSIGSTKSZ. The value of this entry is the minimum amount of stack a signal handler may assume to be available.
0x7ffffff5f8: 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff600: 0xff 0x08 0x00 0x00 0x00 0x00 0x00 0x00
Type 0x10 is AT_HWCAP. This entry is used to indicate the existence of CPU-specific features.
See https://docs.kernel.org/arch/arm64/elf_hwcaps.html for documentation.
0x7ffffff608: 0x06 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff610: 0x00 0x10 0x00 0x00 0x00 0x00 0x00 0x00
Type 0x06 is AT_PAGESZ, the size of a page in bytes.
0x7ffffff618: 0x11 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff620: 0x64 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Type 0x11 is AT_CLKTCK, the number of clock ticks in one second.
This entry is relevant to the times() syscall.
0x7ffffff628: 0x03 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff630: 0x40 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff638: 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff640: 0x38 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Type 0x03 is AT_PHDR, the address where the program header table is mapped.
Normally, the program header table should come just after the ELF header.
On 64-bit platforms it should be located at 0x40.
Type 0x04 is AT_PHENT, the size of each program header entry.
Type 0x05 is AT_PHNUM, the number of program header entries.
0x7ffffff658: 0x07 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff660: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Type 0x07 is AT_BASE, the base address of the ELF interpreter.
This entry is irrelevant if the program is statically linked and does not need an interpreter.
As shown here, this entry is NULL.
0x7ffffff668: 0x08 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff670: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Type 0x08 is AT_FLAGS.
The only flag possible here seems to be AT_FLAGS_PRESERVE_ARGV0.
For executables that require an interpreter, this flag indicates to the interpreter that argv[0] is the same as what the parent process passed in.
0x7ffffff678: 0x09 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff680: 0xe8 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Type 0x09 is AT_ENTRY, the entry point of the executable.
0x7ffffff688: 0x0b 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff690: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff698: 0x0c 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6a0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6a8: 0x0d 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6b0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6b8: 0x0e 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6c0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Type 0x0b is AT_UID, the real UID of the process.
Type 0x0c is AT_EUID, the effective UID of the process.
Type 0x0d is AT_GID, the real GID of the process.
Type 0x0e is AT_EGID, the effective GID of the process.
0x7ffffff6c8: 0x17 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6d0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Type 0x17 is AT_SECURE, a flag that indicates a process should run in "secure mode."
This is only relevant if the kernel runs with Linux Security Modules (LSM) enabled.
0x7ffffff6d8: 0x19 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6e0: 0x28 0xf7 0xff 0xff 0x7f 0x00 0x00 0x00
Type 0x19 is AT_RANDOM, which contains a pointer to 16 bytes of random data.
This is provided to the executable to seed its own pseudorandom generator.
0x7ffffff6e8: 0x1a 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff6f0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Type 0x1a is AT_HWCAP2. This is an extension to the AT_HWCAP entry.
0x7ffffff6f8: 0x1f 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff700: 0xd9 0xff 0xff 0xff 0x7f 0x00 0x00 0x00
Type 0x1f is AT_EXECFN. The value is an address to a string that is the absolute file path of the executable.
0x7ffffff708: 0x0f 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ffffff710: 0x38 0xf7 0xff 0xff 0x7f 0x00 0x00 0x00
Type 0x0f is AT_PLATFORM, indicating the CPU architecture.
The address 0x7ffffff738 contains a string "aarch64".
From the auxv array we can find the location of vDSO, which in this case is at 7ff7fff000.
Now let's interpret the ELF header.
0x7ff7fff000: 0x7f 0x45 0x4c 0x46 0x02 0x01 0x01 0x00
0x7ff7fff008: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff010: 0x03 0x00 0xb7 0x00 0x01 0x00 0x00 0x00
0x7ff7fff018: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff020: 0x40 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff028: 0xf0 0x0b 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff030: 0x00 0x00 0x00 0x00 0x40 0x00 0x38 0x00
0x7ff7fff038: 0x03 0x00 0x40 0x00 0x0d 0x00 0x0c 0x00
The ELF header format is:
/* Elf64_Half is 2 bytes, Elf64_Word is 4 bytes, Elf64_Xword, Elf64_Addr, Elf64_Off are 8 bytes */
typedef struct {
unsigned char e_ident[16];
Elf64_Half e_type;
Elf64_Half e_machine;
Elf64_Word e_version;
Elf64_Addr e_entry;
Elf64_Off e_phoff;
Elf64_Off e_shoff;
Elf64_Word e_flags;
Elf64_Half e_ehsize;
Elf64_Half e_phentsize;
Elf64_Half e_phnum;
Elf64_Half e_shentsize;
Elf64_Half e_shnum;
Elf64_Half e_shstrndx;
} Elf64_Ehdr;Thus there are three program headers, each of size 0x38. We now look at these headers:
0x7ff7fff040: 0x01 0x00 0x00 0x00 0x05 0x00 0x00 0x00
0x7ff7fff048: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff058: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff060: 0x68 0x0b 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff068: 0x68 0x0b 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff070: 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff078: 0x02 0x00 0x00 0x00 0x04 0x00 0x00 0x00
0x7ff7fff080: 0x48 0x0a 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff088: 0x48 0x0a 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff090: 0x48 0x0a 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff098: 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff0a0: 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff0a8: 0x08 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff0b0: 0x04 0x00 0x00 0x00 0x04 0x00 0x00 0x00
0x7ff7fff0b8: 0x90 0x02 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff0c0: 0x90 0x02 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff0c8: 0x90 0x02 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff0d0: 0x54 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff0d8: 0x54 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff0e0: 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00
The definition of a header is:
typedef struct {
Elf64_Word p_type;
Elf64_Word p_flags;
Elf64_Off p_offset;
Elf64_Addr p_vaddr;
Elf64_Addr p_paddr;
Elf64_Xword p_filesz;
Elf64_Xword p_memsz;
Elf64_Xword p_align;
} Elf64_Phdr;The first segment has type PT_LOAD. Its size is 0x0b68, so the executable bytes go from 0x7ff7fff000 to 0x7ff7fffb68.
The second segment has type PT_DYNAMIC. This segment describes where to find the relocation tables, which are necessary to call into dynamic libraries.
The location of the segment is 0x7ff7fffa48, and its length is 0x100 (256 bytes).
The third segment has type PT_NOTE and can be ignored for now.
The dynamic segment contains an array of Elf64_Dyn entries:
typedef struct {
Elf64_Sxword d_tag;
union {
Elf64_Xword d_val;
Elf64_Addr d_ptr;
} d_un;
} Elf64_Dyn;Its content is
0x7ff7fffa48: 0x0e 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa50: 0x6d 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa58: 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa60: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa68: 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa70: 0xe8 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa78: 0x05 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa80: 0xc0 0x01 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa88: 0x06 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa90: 0x18 0x01 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa98: 0x0a 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffaa0: 0x8a 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffaa8: 0x0b 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffab0: 0x18 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffab8: 0xfc 0xff 0xff 0x6f 0x00 0x00 0x00 0x00
0x7ff7fffac0: 0x58 0x02 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffac8: 0xfd 0xff 0xff 0x6f 0x00 0x00 0x00 0x00
0x7ff7fffad0: 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffad8: 0xf0 0xff 0xff 0x6f 0x00 0x00 0x00 0x00
0x7ff7fffae0: 0x4a 0x02 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffae8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffaf0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffaf8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffb00: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffb08: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffb10: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffb18: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffb20: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffb28: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffb30: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffb38: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffb40: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
The four important entries are:
0x7ff7fffa78: 0x05 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa80: 0xc0 0x01 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa88: 0x06 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa90: 0x18 0x01 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffa98: 0x0a 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffaa0: 0x8a 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffaa8: 0x0b 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffab0: 0x18 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Tag 0x05 indicates location of the string table.
Tag 0x06 indicates location of the symbol table.
Tag 0x0a indicates the size of the string table.
Tag 0x0b indicates the size of a symbol entry.
The other entries are: 0x0e (soname of the library), 0x10 (a request to bind external references of the library early, but irrelevant in this context),
0x04 (a hash table), 0x6ffffffc (version definition table), 0x6ffffffd (number of version definitions), and 0x6ffffff0 (address of .gnu.version section).
The dynamic section does not contain REL or RELA entries.
Thus the library does not need relocation processing.
Let's first look at the string table.
0x7ff7fff1c0: 0 '\000' 95 '_' 95 '_' 107 'k' 101 'e' 114 'r' 110 'n' 101 'e'
0x7ff7fff1c8: 108 'l' 95 '_' 99 'c' 108 'l' 111 'o' 99 'c' 107 'k' 95 '_'
0x7ff7fff1d0: 103 'g' 101 'e' 116 't' 116 't' 105 'i' 109 'm' 101 'e' 0 '\000'
0x7ff7fff1d8: 95 '_' 95 '_' 107 'k' 101 'e' 114 'r' 110 'n' 101 'e' 108 'l'
0x7ff7fff1e0: 95 '_' 103 'g' 101 'e' 116 't' 116 't' 105 'i' 109 'm' 101 'e'
0x7ff7fff1e8: 111 'o' 102 'f' 100 'd' 97 'a' 121 'y' 0 '\000' 95 '_' 95 '_'
0x7ff7fff1f0: 107 'k' 101 'e' 114 'r' 110 'n' 101 'e' 108 'l' 95 '_' 99 'c'
0x7ff7fff1f8: 108 'l' 111 'o' 99 'c' 107 'k' 95 '_' 103 'g' 101 'e' 116 't'
0x7ff7fff200: 114 'r' 101 'e' 115 's' 0 '\000' 95 '_' 95 '_' 107 'k' 101 'e'
0x7ff7fff208: 114 'r' 110 'n' 101 'e' 108 'l' 95 '_' 114 'r' 116 't' 95 '_'
0x7ff7fff210: 115 's' 105 'i' 103 'g' 114 'r' 101 'e' 116 't' 117 'u' 114 'r'
0x7ff7fff218: 110 'n' 0 '\000' 95 '_' 95 '_' 107 'k' 101 'e' 114 'r' 110 'n'
0x7ff7fff220: 101 'e' 108 'l' 95 '_' 103 'g' 101 'e' 116 't' 114 'r' 97 'a'
0x7ff7fff228: 110 'n' 100 'd' 111 'o' 109 'm' 0 '\000' 108 'l' 105 'i' 110 'n'
0x7ff7fff230: 117 'u' 120 'x' 45 '-' 118 'v' 100 'd' 115 's' 111 'o' 46 '.'
0x7ff7fff238: 115 's' 111 'o' 46 '.' 49 '1' 0 '\000' 76 'L' 73 'I' 78 'N'
0x7ff7fff240: 85 'U' 88 'X' 95 '_' 50 '2' 46 '.' 54 '6' 46 '.' 51 '3'
0x7ff7fff248: 57 '9' 0 '\000'
Notice that there is a string "__kernel_getrandom" at address 0x7ff7fff21a.
This is good indication that the kernel supports userspace getrandom().
The symbol table is an array of Elf64_Sym entries:
typedef struct {
Elf64_Word st_name;
unsigned char st_info;
unsigned char st_other;
Elf64_Half st_shndx;
Elf64_Addr st_value;
Elf64_Xword st_size;
} Elf64_Sym;How do we get the number of entries in the symbol table? We need to read the size of the symbol table section, and divide it by the size of each entry (24 bytes).
To find information about the symbol table section, we read the section headers, and find a section of type SHT_SYMTAB or SHT_DYNSYM.
Each section is described by the following structure:
typedef struct {
Elf64_Word sh_name;
Elf64_Word sh_type;
Elf64_Xword sh_flags;
Elf64_Addr sh_addr;
Elf64_Off sh_offset;
Elf64_Xword sh_size;
Elf64_Word sh_link;
Elf64_Word sh_info;
Elf64_Xword sh_addralign;
Elf64_Xword sh_entsize;
} Elf64_Shdr;The ELF header says the vDSO object has 13 sections.
Within the section header array, we notice the following entry of type 0x0b (SHT_DYNSYM):
0x7ff7fffc70: 0x11 0x00 0x00 0x00 0x0b 0x00 0x00 0x00
0x7ff7fffc78: 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffc80: 0x18 0x01 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffc88: 0x18 0x01 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffc90: 0xa8 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffc98: 0x03 0x00 0x00 0x00 0x01 0x00 0x00 0x00
0x7ff7fffca0: 0x08 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fffca8: 0x18 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Its offset 0x118 matches the offset of the symbol table in the dynamic segment.
So the size of the symbol table is 0xa8. Since each entry of the symbol table takes 24 bytes, the symbol table has 7 entries.
The content of the symbol table is:
0x7ff7fff118: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff120: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff128: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff130: 0x7d 0x00 0x00 0x00 0x11 0x00 0xf1 0xff
0x7ff7fff138: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff140: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff148: 0x2e 0x00 0x00 0x00 0x12 0x00 0x07 0x00
0x7ff7fff150: 0x10 0x05 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff158: 0x58 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff160: 0x44 0x00 0x00 0x00 0x10 0x00 0x07 0x00
0x7ff7fff168: 0x7c 0x05 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff170: 0x08 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff178: 0x18 0x00 0x00 0x00 0x12 0x00 0x07 0x00
0x7ff7fff180: 0x20 0x04 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff188: 0xe4 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff190: 0x5a 0x00 0x00 0x00 0x12 0x00 0x07 0x00
0x7ff7fff198: 0x90 0x05 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff1a0: 0x60 0x03 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff1a8: 0x01 0x00 0x00 0x00 0x12 0x00 0x07 0x00
0x7ff7fff1b0: 0xf0 0x02 0x00 0x00 0x00 0x00 0x00 0x00
0x7ff7fff1b8: 0x28 0x01 0x00 0x00 0x00 0x00 0x00 0x00
Each symbol has a type (lowest 4 bits of st_info), a binding attribute (highest 4 bits of st_info), and a visiblity attribute (lowest 2 bits of st_other).
The two most important types are STT_OBJECT (variable) and STT_FUNC (function).
If the binding attribute is STB_LOCAL, the symbol is not exported.
If the binding attribute is STB_GLOBAL, the symbol is exported.
There is also STB_WEAK for weak symbols, but irrelevant in this context.
So we are looking for a symbol of type STT_FUNC, has global visiblity, and its name should be __kernel_getrandom.
The offset of the string "__kernel_getrandom" from the beginning of the string table is 0x5a, hence st_name should have the value 0x5a.
The entry at address 0x7ff7fff190 satisfies our criteria.
The st_value field of the entry is 0x0590.
According to the ELF spec, this value is a virtual address.
Since dynamic libraries have no base address, this is simply the offset of the function from the beginning of the mapped area.
Hence we conclude that the address of the __kernel_getrandom function is 0x7ff7fff590.
The signature (and definition) of the vDSO __kernel_getrandom() function can be found in the kernel source code at arch/arm64/kernel/vdso/vgetrandom.c:
ssize_t __kernel_getrandom(void *buffer, size_t len, unsigned int flags, void *opaque_state, size_t opaque_len)
{
if (alternative_has_cap_likely(ARM64_HAS_FPSIMD))
return __cvdso_getrandom(buffer, len, flags, opaque_state, opaque_len);
if (unlikely(opaque_len == ~0UL && !buffer && !len && !flags))
return -ENOSYS;
return getrandom_syscall(buffer, len, flags);
}Notice that beside the usual arguments of getrandom(), we have two more arguments called opaque_state and opaque_len.
To understand what these arguments do, we need to read the original patchset that introduced the vDSO function.
The following text is quoted from https://lwn.net/Articles/980272/:
API-wise, the vDSO gains this function:
ssize_t vgetrandom(void *buffer, size_t len, unsigned int flags, void *opaque_state);
The return value and the first 3 arguments are the same as ordinary
getrandom(), while the last argument is a pointer to some state
allocated with vgetrandom_alloc(), explained below. Were all four
arguments passed to the getrandom syscall, nothing different would
happen, and the functions would have the exact same behavior.
Then, we introduce a new syscall:
void *vgetrandom_alloc(unsigned int *num, unsigned int *size_per_each,
unsigned long addr, unsigned int flags);
This takes a hinted number of opaque states in `num`, and returns a
pointer to an array of opaque states, the number actually allocated back
in `num`, and the size in bytes of each one in `size_per_each`, enabling
a libc to slice up the returned array into a state per each thread. (The
`flags` and `addr` arguments, as well as the `*size_per_each` input
value, are reserved for the future and are forced to be zero for now.)
Libc is expected to allocate a chunk of these on first use, and then
dole them out to threads as they're created, allocating more when
needed. The returned address of the first state may be passed to
munmap(2) with a length of `num * size_per_each`, in order to deallocate
the memory.
A later version of the patchset removed the vgetrandom_alloc() syscall, relying instead on mmap() to do the allocations.
See https://lkml.org/lkml/2024/7/9/768:
Changes v21->v22:
- Only add MAP_DROPPABLE, not the other MAP_*s, but make it imply the other
relevant flags.
- Ensure that mlock() and madvise() can't undo MAP_DROPPABLE implications.
- Since MAP_DROPPABLE is generally useful, remove conditional Kconfig
scafolding around it.
- Follow mm/ standards on comment style.
- Base atop latest selftest PR, to avoid merge conflicts in 6.11.
- Update glibc patches.
Changes v20->v21:
- After extensive conversation with Linus, we're nixing the entire
vgetrandom_alloc() syscall, in favor of just exposing the functionality
needed through mmap() and having the kernel communicate to the caller what
arguments/sizes it should pass to mmap(). This simplifies the series
considerably. It also means that the first commit adds some new MAP_*
constants for mmap().
- Separate vDSO selftests out into separate commit.
The commit messages of the individual patches contain further instructions: https://lkml.org/lkml/2024/7/9/769
This way, allocations used by vDSO getrandom() can use:
VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE
And there will be no problem with using memory when not in use, not
wiping on fork(), coredumps, or writing out to swap.
In order to let vDSO getrandom() use this, expose these via mmap(2) as
MAP_DROPPABLE.
Now how does the kernel "communicate to the caller what arguments/sizes it should pass to mmap()?" It is documented in this article: https://lwn.net/Articles/983186/
If this function is called with the buffer, len, and flags parameters all set to zero and opaque_len set to ~0UL then, rather than generating random data, vgetrandom() will fill the memory pointed to by opaque_state with this structure:
struct vgetrandom_opaque_params {
__u32 size_of_opaque_state;
__u32 mmap_prot;
__u32 mmap_flags;
__u32 reserved[13];
};
The caller can then allocate the needed memory to store the per-thread state for as many threads as might be needed, passing the provided mmap_prot and mmap_flags values directly to mmap(). It is the caller's responsibility to ensure that the allocated memory does not cross a page boundary.
So the overall procedure for calling __kernel_getrandom() is as follows.
First, call __kernel_getrandom (NULL, 0, 0, opaque_state, ~0ull) to get the opaque parameters.
Then, call mmap() to allocate an area of size at least size_of_opaque_state * num_threads where num_threads is the number of threads.
Each thread gets exclusive access to a single opaque_state area.
After initialization, when random numbers are needed, pass the usual arguments, along with opaque_state address and opaque_len (equal to size_of_opaque_state) to the function.