Skip to content

Instantly share code, notes, and snippets.

@fay59
Last active September 4, 2024 23:07
Show Gist options
  • Save fay59/5ccbe684e6e56a7df8815c3486568f01 to your computer and use it in GitHub Desktop.
Save fay59/5ccbe684e6e56a7df8815c3486568f01 to your computer and use it in GitHub Desktop.
Quirks of C

Here's a list of mildly interesting things about the C language that I learned mostly by consuming Clang's ASTs. Although surprises are getting sparser, I might continue to update this document over time.

There are many more mildly interesting features of C++, but the language is literally known for being weird, whereas C is usually considered smaller and simpler, so this is (almost) only about C.

1. Combined type and variable/field declaration, inside a struct scope [https://godbolt.org/g/Rh94Go]

struct foo {
   struct bar {
       int x;
   } baz;
};

void frob() {
   struct bar b; // <-- defined in body of `struct foo`
}

2. Compound literals are lvalues [https://godbolt.org/g/Zup5ZB]

struct foo {
    int bar;
};

void baz() {
    // compound literal:
    // https://en.cppreference.com/w/c/language/compound_literal
    (struct foo){};

    // these are actually lvalues
    ((struct foo){}).bar = 4;
    &(struct foo){};
}

3. Switch cases anywhere [https://godbolt.org/g/fSeL18]

void foo(int p, char* complicated) {
    switch (p) {
    case 0:
        if (complicated[0] == 'a') {
            if (complicated[1] == 'b') {
    case 1:
                complicated[2] = 'c';
            }
        }
        break;
    }
}

(also see: Duff's Device)

4. Flexible array members [https://godbolt.org/g/HCjfzX]

struct flex {
    int count;
    int elems[]; // <-- flexible array member
};

// this lays out the object exactly as expected
struct flex f = {
    .count = 3,
    .elems = {32, 31, 30}
};

_Static_assert(sizeof(struct flex) == sizeof(int), "");
// sizeof(f) does not include the size of statically-declared elements
_Static_assert(sizeof(f) == sizeof(struct flex), "");

// this only builds because .elems is not initialized:
struct flex g[2];

5. {0} as a universal initializer [https://godbolt.org/g/MPKkXv]

typedef int empty_array_t[0];
typedef struct {} empty_struct_t;
typedef int array_t[10];
typedef struct { int f; } struct_t;
typedef float vector_t __attribute__((ext_vector_type(4)));

// {} can initialize structs and arrays and vectors, but not scalars:
empty_array_t ea = {};
empty_struct_t es = {};
array_t a = {};
struct_t s = {};
vector_t v = {};
void* p = {}; // <-- error
int i = {}; // <-- error

// {0} can initialize any data type, including empty arrays/structs.
empty_array_t eaa = {0};
empty_struct_t ess = {0};
array_t aa = {0};
struct_t bb = {0};
vector_t cc = {0};
void* dd = {0}; // <-- happy!
int ee = {0}; // <-- happy!

6. Function typedefs [https://godbolt.org/g/5ctrLv]

typedef void (*function_pointer_t)(int); // <-- this creates a function pointer type
typedef void function_t(int); // <-- this creates a function type
// function_pointer_t == function_t*

function_t my_func; // <-- this declares "void my_func(int)"

void bar() {
    my_func(42);
}

7. Array pointers [https://godbolt.org/g/N85dvv]

typedef int array_t[10]; // array typedef
typedef array_t* array_ptr_t; // array pointer typedef
// same as:
// typedef int (*array_ptr_t)[10];

void foo(array_ptr_t array_ptr) {
    int x = (*array_ptr)[1];
}

void bar() {
    int arr_10[10];
    foo(&arr_10); // <-- yep
    
    int arr_11[11];
    foo(&arr_11); // <-- nope
}

8. Modifiers to array sizes in parameter definitions [https://godbolt.org/z/FnwYUs]

void foo(int arr[static const restrict volatile 10]) {
    // static: the array contains at least 10 elements
    // const, volatile and restrict all apply to the array type.
}

(corrected by Reddit user /u/romv1)

9. Flat initializer lists [https://godbolt.org/g/RmwnoG]

struct foo {
    int x, y;
};

struct lots_of_inits {
    struct foo z[2];
    int w[3];
};

// this is probably more typical
struct lots_of_inits init = {
    {{1, 2}, {3, 4}}, {5, 6, 7}
};

// but braces for inner elements are optional
struct lots_of_inits flat_init = {
    1, 2, 3, 4, 5, 6, 7
};

10. What’s an lvalue, anyway [https://godbolt.org/g/5echfM]

struct bitfield {
    unsigned x: 3;
};

void foo() {
    int a[2];
    int i;
    const int j;
    struct bitfield bf;

    // these are all lvalues
    a; // DeclRefExpr <col:5> 'int [2]' lvalue Var 0x556800650150 'a' 'int [2]'
    i; // DeclRefExpr <col:5> 'int' lvalue Var 0x56289851bf20 'i' 'int'
    j; // DeclRefExpr <col:5> 'const int' lvalue Var 0x555fc6694ff0 'j' 'const int'
    bf.x; // MemberExpr <col:5, col:8> 'unsigned int' lvalue bitfield .x 0x55dab002de28

    // this is not an lvalue
    foo; // DeclRefExpr <col:6> 'void ()' Function 0x563cb79da098 'foo' 'void ()'

    // ... but you can't assign to all of them
    // a = (int [2]){1, 2};
    i = 4;
    // j = 4;
    bf.x = 4;

    // ... and you can't take all of their addresses
    &a;
    &i;
    &j;
    // &bf.x;
    &foo; // but you can take the address of a function, which is not an lvalue

    // so, an lvalue is a value that:
    // - can have its address taken...
    //  - unless it is a bitfield (still an lvalue)
    //  - unless it is a function (not an lvalue)
    // - can be assigned to...
    //  - unless it is an array (still an lvalue)
    //  - unless it is a constant (still an lvalue)
}

11. Void globals [https://godbolt.org/z/C52Wn2]

// You can declare extern globals to incomplete types,
// including `void`.
extern void foo;

12. Alignment implications of bitfields [https://godbolt.org/z/KmB4CB]

struct foo {
    char a;
    long b: 16;
    char c;
};

// `struct foo` has the alignment of its most-aligned member:
// `long b` has an alignment of 8...
int alignof_foo = _Alignof(struct foo);

// ...but `long b: 16` is a bitfield, and is aligned on a char
// boundary.
int offsetof_c = __builtin_offsetof(struct foo, c);

13. static variables are scope-local [https://godbolt.org/z/hdcLYW]

int foo() {
    int* a;
    int* b;
    {
        static int foo;
        a = &foo;
    }
    {
        static int foo;
        b = &foo;
    }
    // this always returns false: two static variables with the same name
    // but declared in different scope refer to different storage.
    return a == b;
}

14. Typedef goes anywhere [https://godbolt.org/z/vZmgha]

short typedef signed s16;
unsigned int typedef u32;
struct foo { int bar } const typedef baz;

s16 a;
u32 b;
baz c;

15. Indexing into an integer [https://godbolt.org/z/IBA5Gr]

int foo(int* ptr, int index) {
    // When indexing, the pointer and integer parts
    // of the subscript expression are interchangeable.
    return ptr[index] + index[ptr];
    // It works this way, according to the standard (§6.5.2.1:2),
    // because A[B] is the same as *(A + B), and addition
    // is commutative.
}

16. The type of enums vs. the type of enumerators [https://godbolt.org/z/Mhsn1n7nd]

In C, enumerators (values declared in enums) have integer type rather than the type of their enclosing enum. For instance:

enum foo { bar, baz, frob };

enum foo is a valid type to use that can store the value of bar, baz and frob. However, the type of bar, baz and frob is an implementation-defined integer type! On many implementations, bar has type int and enum foo has the underlying type unsigned. This means that a check as simple as this one:

enum foo f = bar;
f < baz;

involves a comparison of integers with different signedness.

Further, the type of each enumerator is not guaranteed to be the same. In this example:

enum foo { bar, baz = 0x80000000 };

The type of bar can be int and the type of baz can be unsigned.

Special mentions

1. The power of UB [https://godbolt.org/g/H6mBFT]

extern void this_is_not_directly_called_by_main();

static void (*side_effects)() = 0;

void bar() {
    side_effects = this_is_not_directly_called_by_main;
}

int main() {
    side_effects();
}

compiles to:

bar:                                    # @bar
        ret
main:                                   # @main
        push    rax
        xor     eax, eax
        call    this_is_not_directly_called_by_main
        xor     eax, eax
        pop     rcx
        ret

Main directly calls this_is_not_directly_called_by_main in this implementation. This happens because:

  1. LLVM sees that side_effects has only two possible values: NULL (the initial value) or this_is_not_directly_called_by_main (if bar is called)
  2. LLVM sees that side_effects is called, and it is UB to call a null pointer
  3. UB is impossible, so LLVM assumes that bar will have executed by the time main runs rather than face the consequences
  4. Under this assumption, side_effects is always this_is_not_directly_called_by_main.

2. A constant-expression macro that tells you if an expression is an integer constant [https://godbolt.org/g/a41gmx]

#define ICE_P(x) (sizeof(int) == sizeof(*(1 ? ((void*)((x) * 0l)) : (int*)1)))

int is_a_constant = ICE_P(4);
int is_not_a_constant = ICE_P(is_a_constant);

From Martin Uecker, on the Linux kernel ML. __builtin_constant_p does the same thing on Clang and GCC.

3. Labels inside expression statements in really weird places [https://godbolt.org/g/k9wDRf]

You can make some pretty weird stuff in C, but for a real disaster, you need C++.

class foo {
    int x;

public:
    foo();
};

foo::foo() : x(({ a: 4; })) {
    goto a;
}

Needless to say, statement expressions are not standard C++ (or standard C), but if your compiler has them, chances are that you can use them in really interesting ways.

@Muffindrake
Copy link

Muffindrake commented Sep 10, 2018

Please avoid posting C program code that hasn't gone through thorough review (and had all warnings, including -Wall -Wextra -Wpedantic, fixed, as well as being compiled according to a C standard -std=c11). Even the pedantic warnings are there for good reasons. Many really speak for themselves.

Cherish the warnings that you are actually getting. The more subtle cases of UB (like a null pointer not being required to be a pattern of 0, which is why memset(a, 0, sz) is not strictly correct/technically UB) you will not hear about, and the compiler isn't required to warn you about other cases either.


2:

hello2.c:8:17: warning: ISO C forbids empty initializer braces [-Wpedantic]
     (struct foo){};
                 ^
hello2.c:11:18: warning: ISO C forbids empty initializer braces [-Wpedantic]
     ((struct foo){}).bar = 4;
                  ^
hello2.c:12:18: warning: ISO C forbids empty initializer braces [-Wpedantic]
     &(struct foo){};
                  ^

Empty initializer braces may be part of C++, but they're not allowed in C according to the standard.

The more interesting use cases for these compound literals is that you can pass them into functions, either their value or a pointer to them, without having to put them somewhere nearby in automatic storage, which is admittedly not that useful or unique, and more importantly, allowing you to reset a struct to 0, while correctly setting pointers they contain to null, which memset will not strictly do.

struct t {
        int b;
        char *ptr;
};

int
main(int argc, char **argv)
{
        struct t data = { .b = argc, .ptr = argv[0] };
        data = (struct t) {0};
       /* memset to 0 is not required to set pointers to null, this must */
}

4:

hello4.c:9:14: warning: initialization of a flexible array member [-Wpedantic]
     .elems = {32, 31, 30}
hello4.c:9:14: note: (near initialization for ‘f.elems’)
hello4.c:17:13: warning: invalid use of structure with flexible array member [-Wpedantic]
 struct flex g[2];

5:

hello5.c:1:13: warning: ISO C forbids zero-size array ‘empty_array_t’ [-Wpedantic]
 typedef int empty_array_t[0];
             ^~~~~~~~~~~~~
hello5.c:2:9: warning: struct has no members [-Wpedantic]
 typedef struct {} empty_struct_t;
         ^~~~~~
hello5.c:5:1: warning: ‘ext_vector_type’ attribute directive ignored [-Wattributes]
 typedef float vector_t __attribute__((ext_vector_type(4)));
 ^~~~~~~
hello5.c:8:20: warning: ISO C forbids empty initializer braces [-Wpedantic]
 empty_array_t ea = {};
                    ^
hello5.c:9:21: warning: ISO C forbids empty initializer braces [-Wpedantic]
 empty_struct_t es = {};
                     ^
hello5.c:10:13: warning: ISO C forbids empty initializer braces [-Wpedantic]
 array_t a = {};
             ^
hello5.c:11:14: warning: ISO C forbids empty initializer braces [-Wpedantic]
 struct_t s = {};
              ^
hello5.c:12:14: warning: ISO C forbids empty initializer braces [-Wpedantic]
 vector_t v = {};

hello5.c:12:14: error: empty scalar initializer
hello5.c:12:14: note: (near initialization for ‘v’)
hello5.c:13:11: warning: ISO C forbids empty initializer braces [-Wpedantic]
 void* p = {}; // <-- error
           ^
hello5.c:13:11: error: empty scalar initializer
hello5.c:13:11: note: (near initialization for ‘p’)
hello5.c:14:9: warning: ISO C forbids empty initializer braces [-Wpedantic]
 int i = {}; // <-- error
         ^
hello5.c:14:9: error: empty scalar initializer
hello5.c:14:9: note: (near initialization for ‘i’)
hello5.c:17:22: warning: excess elements in array initializer
 empty_array_t eaa = {0};
                      ^
hello5.c:17:22: note: (near initialization for ‘eaa’)
hello5.c:18:23: warning: excess elements in struct initializer
 empty_struct_t ess = {0};
                       ^
hello5.c:18:23: note: (near initialization for ‘ess’)

This is so wrong that gcc gives you a warning and an error for the same thing. Compiler extensions are strictly not part of the C language.


9:

hello9.c:16:34: warning: missing braces around initializer [-Wmissing-braces]
 struct lots_of_inits flat_init = {
                                  ^
     1, 2, 3, 4, 5, 6, 7
     {{  } {   }}{
 };
 }

While not strictly required, a warning is still printed, even giving you the correct braces, because it's so easy to introduce bugs otherwise.


12:
Nearly everything about bitfields is horrifyingly implementation-dependent, so results will vary from compiler to compiler, as such your paste is completely pointless and devoid of useful information.

To force a bitfield to be aligned "as you would expect", which is "overlaid over the basic integer type", one would use:

struct foo {
    char a;
    long:0;
    long b: 16;
    long:0;
    char c;
};

which is then laid out the same way as this struct (which is 24 bytes in size, with a 8 byte size 8 byte alignment long):

struct foo {
    char a;
    long b;
    char c;
};

(this is obviously still very implementation-dependent)


"special mentions 1":

This is not "interesting UB", this is just UB which is to be avoided at all times. Never ever write code this way.

@AbigailBuccaneer
Copy link

"Special mentions 3" fails to compile with -Werror=pedantic too, as the braced statement expression is a GNU extension.

@samliddicott
Copy link

It would be very nice if the compiler would be a help in avoiding undefined behaviour instead of effectively writing a different program behind your back.

@JohnDoneth
Copy link

@samliddicott I agree. Have you heard of Rust? One of my favorite features is how the compiler won't let you shoot yourself in the foot, even with threading.

@eighthjouster
Copy link

eighthjouster commented Sep 10, 2018

@samliddicott, I see your point in the sense that the compiler should probably emit a message indicating "whoa, this is undefined behavior!" if it's as clear as this example.

Having said that, eh, the C standard states that under undefined behavior, all bets are off. And as such, the compiler can do whatever. And that's exactly what happened here.

@jason-s
Copy link

jason-s commented Sep 10, 2018

the compiler should probably emit a message indicating "whoa, this is undefined behavior!"

That's not possible in general. Undefined behavior is often undefined at runtime and cannot be determined to be undefined using static analysis.

@BatmanAoD
Copy link

Having said that, eh, the C standard states that under undefined behavior, all bets are off.

That's precisely the problem, along with the fact that the C standard so freely declares so many parts of the language to be undefined.

Even with good tooling, determining which parts of a C program may cause undefined behavior is extremely nontrivial.

@Noxitu
Copy link

Noxitu commented Sep 11, 2018

@BatmanAoD

That's precisely the problem, along with the fact that the C standard so freely declares so many parts of the language to be undefined.

This is not a problem. Undefined behavior has simple reason: performance.

What should happen when you read outside of allocated memory? Should compiler always check for bounds?

How should integers overflow? Should it be standarized? Should it be defined as "whatever cpu does"?

The last one is really interesting, since it can be extended to question: Does this code operate on continuous chunk of memory?

int *array = ???; for(int index = start; index != end; ++index) { array[index]; }

Answering "yes" to previous allows for really nice optimizations. And invoke undefined behavior if overflow happens.

@andermoran
Copy link

andermoran commented Nov 6, 2019

1 ? ((void*)((x) * 0l)) : (int*)1
Can someone explain this ternary operator on special mention #2? It seems like it would always choose the first argument ((void*)((x) * 0l)) since 1 evaluates to true. This is confusing.

@fay59
Copy link
Author

fay59 commented Nov 8, 2019

@andermoran, the condition isn't important: the magic is that through C's loose interpretation of what constitutes a constant. When x is a constant (like 4), (void*)(4 * 0l) is understood by the C compiler to be the same as (void*)0, which is the null-to-pointer special case. The type of 1 ? NULL : (int*)1 is inferred to be int* because of the special nature of NULL. When x is not a constant (like y), (void*)(y * 0l) is interpreted as a regular int-to-pointer cast to void*, and in that case the type of the expression is coerced to void*.

@moon-chilled
Copy link

so, an lvalue is a value that:

  • can have its address taken...
    • unless it is a bitfield (still an lvalue)
    • unless it is a function (not an lvalue)

Also can't take the address of something that's 'register'-qualified

@ztane
Copy link

ztane commented Nov 20, 2022

As the C standard says, "an lvalue is an expression (with an object type other than void) that potentially designates an object". That's it. Why potentially? Because *p is an lvalue expression but if the value of p does not point to an object, then *p does not designate an object (its use has undefined behaviour).

@Jake-Jensen
Copy link

Y'all are far too worried about what the spec says and not the literal value of the topic. This is stuff you can do, not should or will.

@casual-engineer
Copy link

casual-engineer commented Jul 24, 2023

We can also create a main function of type void and forsake the ugly looking return 0 at the end of the code :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment