Skip to content

Instantly share code, notes, and snippets.

@roosemberth
Last active October 31, 2019 12:06
Show Gist options
  • Save roosemberth/613b2519f4942e40f07c21e7c855c8d8 to your computer and use it in GitHub Desktop.
Save roosemberth/613b2519f4942e40f07c21e7c855c8d8 to your computer and use it in GitHub Desktop.

I was to lazy to figure out a title, if I ever publish this somewhere else, I'll probably try to figure one out.

Let's look at the program bellow:

#include <stdio.h>
#define DEFAULT_MSG "Default message"

static void say(const char *msg) {
    if (msg == NULL) {
        msg = DEFAULT_MSG;
    }
    printf("(0x%lx) %s\n", (size_t)msg, msg);
}

int main(int argc, char **argv) {
    if (argc > 1)
        say(argv[1]);
    else
        say(NULL);
    return 0;
}

Let's compile it using a bunch of unnecessary flags:

gcc test.c -o test -W -Wall -Wextra -pedantic -Wcast-align -Wcast-qual -Wconversion -Wwrite-strings -Wfloat-equal -Wpointer-arith -Wformat=2 -Winit-self -Wuninitialized -Wshadow -Wstrict-prototypes -Wmissing-declarations -Wmissing-prototypes -Wno-unused-parameter -Wbad-function-cast -Wunreachable-code -O0 -g

And run it:

~$ ./test
(0x402004) Default message
~$ ./test hello!
(0x7ffd713d125a) hello!

The C programming language represents strings as NULL-terminated character arrays. Also, it can only pass strings to a function by reference. Albeit it may seem a little bit unintuitive, this is the same as passing a pointer to a string by value.

In the example above, the function say takes a pointer to a constant string; which means that through that pointer we should not modify the referenced string. This is not to be confused with a constant pointer to a string; which would mean that the pointer itself should not be modified, but it may however be used to modify the referenced string.

For instance, a constant pointer to a constant string would not compile:

static void say(const char * const msg) {
    if (msg == NULL) {
        msg = DEFAULT_MSG;
    }
    printf("(0x%lx) %s\n", (size_t)msg, msg);
}
test.c: In function ‘say’:
test.c:6:13: error: assignment of read-only parameter ‘msg’
         msg = DEFAULT_MSG;
             ^

Exercise to the reader: what could go wrong from using a constant pointer to a string to modify argv[1]?

Now, let's rewind for a little and see how strings are stored in our program.

First, we should distinguish two types of strings: “Constant strings” (const char*) and “dynamic strings” (char *).

One important remark is that we can always get a const char* from a char *, since the difference relies in promising the compiler not using that particular pointer to modify the string.

Now, the compiler macro DEFAULT_MSG will be replaced at pre-compile time by its value, so the function say will be seen by the compiler as so:

static void say(const char *msg) {
    if (msg == NULL) {
        msg = "Default message";
    }
    printf("(0x%lx) %s\n", (size_t)msg, msg);
}

Constant strings are stored with the compiled object and later relocated by the linker within the produced binary, we can verify this by compiling the intermediary object, linking and peeking inside them.

~$ gcc -c test.c -o test.o
~$ xxd test.o | grep -C 2 Default
00000090: f048 83c0 0848 8b00 4889 c7e8 a0ff ffff  .H...H..H.......
000000a0: eb0a bf00 0000 00e8 94ff ffff b800 0000  ................
000000b0: 00c9 c344 6566 6175 6c74 206d 6573 7361  ...Default messa
000000c0: 6765 0028 3078 256c 7829 2025 730a 0000  ge.(0x%lx) %s...
000000d0: 4743 433a 2028 474e 5529 2038 2e33 2e30  GCC: (GNU) 8.3.0
~$ gcc test.o -o test
~$ xxd test | grep -C 2 Default
00001fe0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001ff0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00002000: 0100 0200 4465 6661 756c 7420 6d65 7373  ....Default mess
00002010: 6167 6500 2830 7825 6c78 2920 2573 0a00  age.(0x%lx) %s..
00002020: 011b 033b 3c00 0000 0600 0000 00f0 ffff  ...;<...........
~$ readelf -S test
There are 35 section headers, starting at offset 0x4f38:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  <--snip-->
  [15] .rodata           PROGBITS         0000000000402000  00002000
       0000000000000020  0000000000000000   A       0     0     4
  <--snip-->
  [30] .debug_str        PROGBITS         0000000000000000  00003e11
       0000000000000657  0000000000000001  MS       0     0     1
  <--snip-->
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),

By looking at a hex dump we can see our string located at address 0x2004, in the binary. Using readelf, we can study each of its sections, in particular the .rodata section, which will be mapped at address 0x402000 by the dynamic linker from file contents at offset 0x2000. We can also see that the entire section is aligned at the 4 byte boundary. By looking at the flags we can see that such section is just allocated and it is not executable.

One may be inclined to think that the S (String) flag should be set here aswell. Exercise to the reader: why is it not set?

Looking at the .rodata section, we see our message and an extra string. After looking back at the code, we notice that it corresponds to the format string we passed to printf! (Exercise to the reader: where is the ^J coming from?)

~$ readelf -p .rodata test

String dump of section '.rodata':
  [     4]  Default message
  [    14]  (0x%lx) %s^J

Now, our default message is located 4 bytes after the .rodata section, which is mapped at 0x402000. This means that it will be available at runtime at address 0x402004; which corresponds to what we see when we print its address.

~$ ./test
(0x402004) Default message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment