Skip to content

Instantly share code, notes, and snippets.

@leptos-null
Last active July 31, 2024 20:36
Show Gist options
  • Save leptos-null/58b9a51c4444354d96b771897bd469ce to your computer and use it in GitHub Desktop.
Save leptos-null/58b9a51c4444354d96b771897bd469ce to your computer and use it in GitHub Desktop.
Print range with printf without format specifiers

I was reviewing my GitHub gists and saw this:

#include <stdio.h>
#include <stdlib.h>

#pragma clang diagnostic ignored "-Wformat-security"

int main() {
    /* 4 bytes (chars)
     * little endian: NUL NUL SPC NUM
     * big    endian: NUM SPC NUL NUL
     */
#if __LITTLE_ENDIAN__
    for (uint32_t i = (' ' << 8 | '1'); i < (' ' << 8 | '5'); i++) {
#else
    for (uint32_t i = ('1' << 24 | ' ' << 16); i < ('5' << 24 | ' ' << 16); i += (1 << 24)) {
#endif
        printf((char *)&i);
    }
}

I shared the snippet with someone that had been looking for examples of bitwise operators, but then I realized there's a lot more going on here. Let's run this code, find out what it does, and then break it down.

$ clang main.c -o main
$ ./main
1 2 3 4 

This looks simple enough. The program prints the numbers 1 through 4 with a space in between. We might write this program a little more normally with

#include <stdio.h>

int main() {
    for (int i = 1; i < 5; i++) {
        printf("%d ", i);
    }
}

If you're familiar with format specifiers, this is fairly simple: %d is replaced with a decimal representation of the i variable.

ASCII

Since i is always 1 digit for the values we're using, the conversion from integer to string is straightforward. ASCII makes this easy for us. We don't even need to know what '0' is encoded to, just that all of the decimal digits have subsequent encodings. This means that '0' + 1 is '1' and '1' + 1 == '2'.

We can see what this actually means:

#include <stdio.h>

int main() {
    for (char c = '0'; c <= '9'; c++) {
        printf("'%c' -> %hhd\n", c, c);
    }
}
'0' -> 48
'1' -> 49
'2' -> 50
'3' -> 51
'4' -> 52
'5' -> 53
'6' -> 54
'7' -> 55
'8' -> 56
'9' -> 57

From this output, we learn that '0' is encoded as 48. This output also confirms that the ASCII representation of the decimal digits are in order, so we can do arithmetic on the ASCII values in this range.

For more information on string format specifiers, I suggest referencing https://pubs.opengroup.org/onlinepubs/009695399/functions/printf.html

Let's write our program again using %c instead of %d now:

#include <stdio.h>

int main() {
    for (char c = '1'; c < '5'; c++) {
        printf("%c ", c);
    }
}

This doesn't look much different than our %d version, but it's a little more efficient because %c is very easy to evaluate (just copy a byte into the result string) versus %d which could be multiple bytes.

If %c is just copying a byte into a result string, can we do it ourselves? We can. Let's think about what this result string will look like.

Remember that C strings are null-terminated. This means that after we have all of the bytes we want to print out, we need to add a null-byte at the end so that a given function knows where the string ends. When we type a string literal in C, this is done for us. For example, "str" appears to be 3 characters, however we need 4 bytes to represent this string since there's a null byte afterwards. We could write "str" using an explicit byte array: char str[4] = { 's', 't', 'r', 0 };

Back to evaluating %c in our for-loop: We want some strings that look like "1 " which is { '1', ' ', 0 } (ASCII 1, ASCII space, null byte). We can just replace the first byte with the char from our loop:

#include <stdio.h>

int main() {
    char buf[3] = { 0, ' ', 0 };
    
    for (char c = '1'; c < '5'; c++) {
        buf[0] = c;
        printf(buf);
    }
}

Pragma diagnostic

When we compile this with a modern C compiler, we might see a warning

main.c:8:16: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
        printf(buf);
               ^~~

This warning is telling us that the first parameter to printf should always be a string literal. The warning is correct, but we've passed a variable intentionally and we're just doing this for a demo. For now, we'll turn off the warning. Usually we would turn off this warning by passing -Wno-format-security to clang: clang -Wno-format-security main.c -o main. This compiles with no warnings now. But everyone that uses this code snippet will get the same warning, so let's put this information in the file:

#include <stdio.h>

#pragma clang diagnostic ignored "-Wformat-security"

int main() {
    char buf[3] = { 0, ' ', 0 };
    
    for (char c = '1'; c < '5'; c++) {
        buf[0] = c;
        printf(buf);
    }
}

We've now ignored this specific warning for this whole file. We could also ignore the warning for just a segment of the file, but generally you shouldn't turn off warnings at all, so we'll skip this. And since we shouldn't turn off warnings at all, the way to write this code without warnings or security issues would be

#include <unistd.h>

int main() {
    char buf[3] = { 0, ' ', 0 };
    
    for (char c = '1'; c < '5'; c++) {
        buf[0] = c;
        write(STDOUT_FILENO, buf, sizeof(buf));
    }
}

Understanding this snippet is an exercise to the reader.

Endianess

We're still not quite at the code snippet that opened this article. We have an array of 3 bytes. This is pretty small- long is 8 bytes on many platforms. int is 4 bytes on almost all platforms. Can we simply represent our byte array as an int? Let's expand our array to be 4 bytes and find out. We can put any byte we want after the null terminating byte since printf will stop reading once it sees the null byte. I'll just use another null byte here.

#include <stdio.h>

#pragma clang diagnostic ignored "-Wformat-security"

int main() {
    char buf[4] = { 0, ' ', 0, 0 };
    
    for (char c = '1'; c < '5'; c++) {
        buf[0] = c;
        printf(buf);
    }
}

This still compiles and the output is still the same.

Let's try to print these 4 bytes out as an integer:

#include <stdio.h>

int main() {
    char buf[4] = { 0, ' ', 0, 0 };
    
    for (char c = '1'; c < '5'; c++) {
        buf[0] = c;
        printf("\"%s\" -> %d\n", buf, *(int *)buf);
    }
}
"1 " -> 8241
"2 " -> 8242
"3 " -> 8243
"4 " -> 8244

In this snippet, I've printed our the string representation of the buffer and the integer representation of the same byte sequence. It seems right, as we increase c in the buf, the integer also increases by 1.

But wait, does this make sense? If we just think of normal decimal numbers, if I start with 100 and swap out the digit with a 2, the number is now 200. Why does it seem to be the opposite here?

The reason is endianess. Specifically, the computer I'm running these examples on is little endian. Most computers these days are are little endian. This means that the order that bytes are in to represent integers are such that the most significant byte is at the end. The most significant byte means the byte that has the most value, in other words, a change in the most significant byte would change the value of the integer the most. For example, in decimal, if we have the number 4567, the number on the leading side (4 in this case) is most significant because changing the value to 8 would be 8567 which is a change of 4000; versus changing the trailing digit (7 in the original number) to 3 would be 4563 which is a change of only 4.

The output makes more sense now: as we increase the first byte in our byte array, the integer only increases by 1 since the first byte is the least significant byte.

Let's print out the same buffer in some more formats:

#include <stdio.h>

int main() {
    char buf[4] = { 0, ' ', 0, 0 };
    
    for (char c = '1'; c < '5'; c++) {
        buf[0] = c;
        printf("\"%s\" -> %02hhx %02hhx %02hhx %02hhx -> %08x\n", buf, buf[0], buf[1], buf[2], buf[3], *(int *)buf);
    }
}
"1 " -> 31 20 00 00 -> 00002031
"2 " -> 32 20 00 00 -> 00002032
"3 " -> 33 20 00 00 -> 00002033
"4 " -> 34 20 00 00 -> 00002034

This is similar our last program, but we're printing out in hexadecimal format. First we print out the string like usual, then print out each byte in hexadecimal, and lastly print out the integer in hexadecimal (hex, for short).

The byte order is more clear here. We can see that the integer is in the opposite order as the byte array.

This was helpful to understand endianess, but we were actually already ready to write this loop using just an int with our last example:

#include <stdio.h>

#pragma clang diagnostic ignored "-Wformat-security"

int main() {
    for (int i = 8241; i < 8245; i++) {
        printf((char *)&i);
    }
}

It works! (If you're running on a little endian machine.) But it's very difficult to understand what this program does, or how we might be able to update it if needed in the future.

Bitwise operators

Let's look back at the hexidecimal print out to give us some ideas.

"1 " -> 31 20 00 00 -> 00002031
"2 " -> 32 20 00 00 -> 00002032
"3 " -> 33 20 00 00 -> 00002033
"4 " -> 34 20 00 00 -> 00002034

(copy of previous output for reference)

We can gather that 31 is the hex representation of '1' and that 20 is hex for ' ' (space).

We want to create an integer that's a space and then the numeric digit. This is where we'll get to bitwise operators. Bitwise operations make sense in the context of a binary representation of a number. Binary is base 2. The common representation that humans use is decimal, which is base 10. We've also used hex in this article, which is base 16.

In decimal, if you want to represent 2 million, you might write 2,000,000 or 2e6 in scientific notation. This is similar to the shift bitwise operator: 1 << 6 means shift 1 to the left 6 bits, which is 1000000 (binary). We won't use it in this article, but we can shift with other way: 1000000 >> 4 means shift 1000000 to the right by 4 bits, which is 100 (binary).

We like hex in computer science because since log base 2 of 16 (log_2 (16)) is exactly 4, hex has the special property that there are exactly 4 binary digits for each hex digit.

We'll learn 1 more bitwise operator: | which is "bitwise or". This is easiest to explain with an example.

  1001
| 1100
-------
  1101

Here we're taking the bitwise or of 1001 (binary) and 1100 (binary). For each bit, the result is 1 if either of the inputs have a 1 in that column. Otherwise both inputs have a 0 in that column, and 0 is the result. This operation is very fast to compute because there's no "carry" bits like there are in addition, for example. Each column can be computed independently.

There's also a "bitwise and" operator, which works in the opposite way: the result for the column is 1 if both of the inputs have a 1 in that column. Otherwise at least one of the inputs has a 0 in that column, and 0 is the result. This operation is similarly fast to compute.

Let's try using bitwise operators with ASCII values to get 00002031 (hex). We know that 20 (hex) is ' ' and 31 (hex) is '1'. Ideally we would write something like ' ''1' but that doesn't compile. ' 1' actually does compile with clang, but we get a warning: multi-character character constant [-Wmultichar]. We could turn the warning off, but we know we shouldn't disable or ignore warnings. We can resolve the warning by using a full 4 bytes: '\0\0 1' and this does exactly what we want. The \0 here are the way to represent a 0 byte inside of a character literal in C.

#include <stdio.h>

#pragma clang diagnostic ignored "-Wformat-security"

int main() {
    for (int i = '\0\0 1'; i < '\0\0 5'; i++) {
        printf((char *)&i);
    }
}

This compiles into the exact same program as we have above with 8241...8245 and it's a little more understandable.

We were supposed to do this with bitwise operators though. Let's start with just the digit part and print this out using one of our programs from above:

#include <stdio.h>

int main() {
    for (int i = '1'; i < '5'; i++) {
        printf("%08x\n", i);
    }
}
00000031
00000032
00000033
00000034

Okay, looks like a start. Our digit is in the place we want it to be. Now we just need to put 20 in the 2 digits before.

We're saying 2 digits, but we're talking about hex here. And remember from a little earlier that 1 hex digit is 4 binary digits. Which means that 2 hex digits is 8 binary digits. This also makes sense, since 1 byte is 8 bits, or (in other words), 8 binary digits.

In C, we can represent a hex value by prefixing the value 0x, so 0x20 is 20 in hex. Let's just try shifting 0x20 by 8 digits:

#include <stdio.h>

int main() {
    printf("%08x\n", 0x20 << 8);
}
00002000

Hey, it worked! But this 20 isn't immediately meaningful to someone reading this code; does it work with ' '?

#include <stdio.h>

int main() {
    printf("%08x\n", ' ' << 8);
}

It does! The output is the same.

Now we just need to combine these. We can combine these values literally using the | operator. Let's see it:

#include <stdio.h>

int main() {
    for (int i = '1'; i < '5'; i++) {
        printf("%08x\n", (' ' << 8) | i);
    }
}
00002031
00002032
00002033
00002034

The output matches what we want. So we can just remove the format specifier and use the printf code from earlier:

#include <stdio.h>

#pragma clang diagnostic ignored "-Wformat-security"

int main() {
    for (int i = '1'; i < '5'; i++) {
        printf(&((' ' << 8) | i));
    }
}
main.c:7:16: error: cannot take the address of an rvalue of type 'int'
        printf(&((' ' << 8) | i));
               ^ ~~~~~~~~~~~~~~

Uh oh. We got an error. We're using the "reference to" operator, which is only valid on variables, not any expression. This is easily fixed:

#include <stdio.h>

#pragma clang diagnostic ignored "-Wformat-security"

int main() {
    for (int i = '1'; i < '5'; i++) {
        int variable = (' ' << 8) | i;
        printf((char *)&variable);
    }
}

And it works! But this isn't fun- we had to make another variable. We can work around this by moving the bitwise operators into the for-loop:

#include <stdio.h>

#pragma clang diagnostic ignored "-Wformat-security"

int main() {
    for (int i = ((' ' << 8) | '1'); i < ((' ' << 8) | '5'); i++) {
        printf((char *)&i);
    }
}

Alright, the output is what we're looking for, and the source looks pretty similar to what we have in the initial program. But all of this code only works on little endian machines.

Using the code in the first program and what we've learned about endianess and bitwise operators, adjusting this program to support big endian machines is left as an exercise to the reader.

@leptos-null
Copy link
Author

Using the write version, we can also write this more simply as

#include <unistd.h>

int main() {
#if __LITTLE_ENDIAN__
    for (short i = ((' ' << 8) | '1'); i < ((' ' << 8) | '5'); i++) {
#else
    for (short i = (('1' << 8) | ' '); i < (('5' << 8) | ' '); i += (1 << 8)) {
#endif
        write(STDOUT_FILENO, &i, sizeof(i));
    }
}

write takes a byte array and array length. Since this isn't a string, it doesn't need to be null terminated, and we only need 2 bytes, which fits in a short on many platforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment