Skip to content

Instantly share code, notes, and snippets.

@MangaD
Last active January 4, 2026 01:13
Show Gist options
  • Select an option

  • Save MangaD/9f6a45317f71ac6504e11992dd8a2e4d to your computer and use it in GitHub Desktop.

Select an option

Save MangaD/9f6a45317f71ac6504e11992dd8a2e4d to your computer and use it in GitHub Desktop.
Deep Dive into Unions in C and C++

Deep Dive into Unions in C and C++

CC0

Disclaimer: Grok generated document.

Unions are a fundamental feature in both C and C++ for defining data types that can store different types of data in the same memory location. They are particularly useful for memory optimization, type punning (with caveats), and implementing variant-like structures. This guide covers everything from basics to advanced topics, including syntax, semantics, differences between C and C++, pitfalls, use cases, and modern alternatives. I'll use code examples, tables for comparisons, and explanations grounded in language standards (up to C23 and C++23 as of 2025).

1. Introduction to Unions

A union is a user-defined type where all members share the same memory address. Unlike a struct (which allocates separate space for each member), a union overlays members, so its size is determined by the largest member. Only one member can hold a valid value at a time—accessing others leads to undefined behavior (UB).

  • Purpose: Save memory when mutually exclusive data is needed (e.g., a value that could be an int or a float).
  • Key Rule: The "active member" is the last one written to. Reading from others is UB.
  • Standards Origin: Introduced in early C (K&R C), formalized in C89. C++ inherits from C but adds object-oriented complexities.

Unions are not type-safe by design, making them error-prone but powerful for low-level programming.

2. Syntax and Declaration

Basic Declaration

In both C and C++:

union MyUnion {
    int i;
    float f;
    char c[4];  // Array member (size 4 bytes)
};
  • Size: sizeof(MyUnion) is the size of the largest member (e.g., 4 bytes if int/float are 4 bytes).
  • Instantiation: union MyUnion u; or just MyUnion u; in C++ (tag optional in some contexts).

Anonymous Unions

  • In C++ (since C++11, but common earlier): Unions without a name, embedded in structs for overlaying.
  • In C: Supported in C11 onward (anonymous unions in structs).

Example (C++):

struct Point {
    union {
        int x;
        float fx;  // Overlays x
    };
    int y;
};
Point p;
p.x = 42;  // Same as p.fx (but interpret as float = UB)

In C (C11+):

struct Point {
    union {
        int x;
        float fx;
    };
};

Typedef for Convenience

Common in both:

typedef union {
    int i;
    float f;
} Variant;
Variant v;

3. Memory Layout and Alignment

  • Memory Overlap: All members start at offset 0.
  • Padding/Alignment: Unions follow the alignment of the most strictly aligned member. Padding may be added to reach that.
  • Size Calculation: sizeof(union) = max(sizeof(each member)), rounded up for alignment.

Example Table of Layout (assuming 4-byte int/float, 1-byte char):

Member Type Size Alignment Offset
i int 4 4 0
f float 4 4 0
c char[4] 4 1 0
  • Bit-Fields in Unions: Allowed in both C and C++. Useful for packing bits.
    union Flags {
        unsigned int value : 32;
        struct {
            unsigned int flagA : 1;
            unsigned int flagB : 1;
            // ... up to 32 bits
        };
    };
    • In C: Bit-fields can span bytes; unnamed fields for padding.
    • In C++: Similar, but classes can have bit-fields too.

4. Usage in C

C unions are straightforward, mainly for primitives/arrays/pointers.

Initialization and Access

union MyUnion u = { .i = 42 };  // C99+ designated initializer
u.f = 3.14;  // Now f is active, i is invalid
printf("%f\n", u.f);  // 3.14
// printf("%d\n", u.i);  // UB! (though may print garbage)
  • Type Punning: Reading as a different type (e.g., int as float) is common but technically UB in strict aliasing rules (C99+). Use memcpy for safety.
  • Anonymous Unions in C11+: As above, for structs.
  • Restrictions: No constructors/destructors (C has no classes). Members can't have non-trivial types.

Common C Use Cases

  • Endian Detection:
    union {
        unsigned int i;
        unsigned char c[4];
    } endian = { .i = 0x01020304 };
    if (endian.c[0] == 0x04) { /* Little-endian */ }
  • Variant Storage: Manually track type with an enum.

5. Usage in C++

C++ builds on C but introduces challenges with classes/structs.

Unions with Trivial Types

Same as C for POD (Plain Old Data) types.

Unions with Non-Trivial Types (Classes/Structs)

  • Allowed Since: C++98, but with rules.
  • Constructors/Destructors: Unions don't call ctors/dtors automatically (except in C++11+ for default-init).
  • Restrictions (C++11+):
    • If a union has a non-trivial member (e.g., std::string with ctor), it can't have default ctor/copy/move unless you provide them.
    • Only one member can be active; switching requires manual destruction and placement new.

Example:

#include <string>
#include <new>  // For placement new

union Data {
    int i;
    std::string str;
    
    Data() {}  // Must provide if str is non-trivial
    ~Data() {} // Must provide
    
    void setString(const std::string& s) {
        new (&str) std::string(s);  // Construct in place
    }
    
    void destroyString() {
        str.~std::string();  // Explicit dtor
    }
};
  • C++11 Improvements: Unions can have static members, virtual functions (rarely useful).
  • C++17/20: Better support for unions in variants (see below).

Anonymous Unions in C++

  • Can be used globally (unnamed namespace) or in classes.
  • Useful for aliasing members.

6. Tagged/Discriminated Unions

Not built-in; implement manually for safety.

In C:

enum Type { INT, FLOAT };
struct Variant {
    enum Type type;
    union {
        int i;
        float f;
    } data;
};

In C++: Same, or use std::variant (C++17+).

7. Differences Between Unions and Structs

Aspect Union Struct
Memory Overlaid (shared) Sequential (separate)
Size Largest member Sum of members + padding
Access One active member All members always accessible
Use Case Alternatives Composites
Initialization One member at a time All members
C++ Complexity Manual ctors/dtors for objects Automatic

8. Differences Between C and C++

Feature C C++
Anonymous Unions C11+ (in structs) Yes (earlier)
Non-Trivial Members No (no classes) Yes, with manual management
Static Members No Yes (C++11+)
Virtual Functions No Yes (but unusual)
Type Punning UB under strict aliasing Same + std::bit_cast (C++20)
Standard Alternatives None std::variant (C++17)
  • C23 Additions: Better support for bit-precise integers in unions.
  • C++23: Minor refinements, like better constexpr support.

9. Pitfalls and Undefined Behavior

  • Accessing Inactive Member: Always UB. May work in practice but not portable.
  • Strict Aliasing Violation: Punning types (e.g., int as float) breaks rules unless via char* or memcpy.
  • Alignment Issues: Misaligned access crashes on some architectures.
  • Lifetime Management in C++: Forgetting to call dtor on objects leads to leaks/UB.
  • Initializer Rules: In C99+, designated; in C++, aggregate init for POD unions.
  • Portability: Endianness, padding vary by platform.

Avoid unions for type punning; use std::bit_cast (C++20) or memcpy.

10. Common Use Cases

  • Memory Optimization: In embedded systems (e.g., overlaying registers).
  • Variant Types: Before std::variant, unions + enum.
  • Networking/Parsing: Overlay headers on packets.
  • Graphics: Color unions (e.g., RGBA as uint32 or separate bytes).
  • Debugging: Inspect memory representations.

11. Modern Alternatives

  • C++17 std::variant: Safe, automatic management.

    #include <variant>
    std::variant<int, float, std::string> v = 42;
    if (std::holds_alternative<int>(v)) {
        std::cout << std::get<int>(v) << std::endl;
    }
    v = "hello";  // Auto-destroys int
    • Visits with std::visit for type-safe handling.
  • std::any (C++17): For unknown types.

  • Boost.Variant (pre-C++17).

  • In C: Libraries like GLib's GVariant.

12. Performance Considerations

  • Pros: Zero overhead; pure memory overlay.
  • Cons: Manual tracking adds code complexity/runtime checks.
  • Unions are POD, so memcpy-able and efficient in arrays.

Summary and Recommendations

Unions are a low-level tool for memory sharing, essential in systems programming but risky due to UB potential. In C, stick to primitives and manual tagging. In C++, prefer std::variant for safety unless performance demands raw unions (e.g., in kernels or games).

  • When to Use: For legacy code, extreme optimization, or C compatibility.
  • Best Practice: Always pair with an enum for type tracking; use placement new/dtors in C++.
  • Learn More: Refer to C standards (ISO/IEC 9899) and C++ (ISO/IEC 14882) for edge cases.

If you have a specific example or scenario, I can elaborate further! 🚀


ChatGPT follow-up with corrections

Below is a full, deep-dive “article-style” treatment of unions in C and C++, with extra attention to the places where explanations on the internet (and in many summaries) tend to be subtly wrong, incomplete, or “works-on-my-compiler”-dependent. I’ll also explicitly call out a couple of points from your attached document that are commonly stated but need nuance, especially around type punning and “reading inactive members.”


Unions in C and C++: A Comprehensive Deep Dive (Semantics, Lifetime, Aliasing, and Real-World Patterns)

0. What a union really is (and what it is not)

A union is a type whose non-static data members share the same storage. Intuitively:

  • A struct is composition: members live next to each other.
  • A union is overlay: members live on top of each other.

Two immediate consequences:

  1. sizeof(union) is at least the size of its largest member (plus possible padding for alignment).
  2. A union object has one chunk of storage, and member access is a typed view onto that storage.

The tricky parts are not about layout—they’re about:

  • Which member is “active” (C++ concept, and quasi-concept in C via effective type rules).
  • Object lifetime (particularly in C++).
  • Aliasing / effective type / optimization assumptions (both C and C++).

Your attached document gives a broadly good overview, but it leans into a too-simple rule—“reading from a member other than the last one written is UB”—which is often practically useful advice yet is not the whole story across C vs C++ and across certain special cases.


1. Layout and alignment: the “easy” part (mostly)

1.1 Size and alignment rules

For a union:

  • All non-static data members have the same address in C++ (and, in practice and by modern specs, effectively in C as well). cppreference summarizes this expectation for C++ unions. (Cppreference)
  • The union’s alignment is the maximum alignment requirement among its members.
  • sizeof(union) is large enough for the largest member, rounded up to satisfy alignment.

Example:

union U {
    uint8_t  b;
    uint32_t w;
    double   d;
};

Typical (but platform-dependent):

  • alignof(U) = alignof(double)
  • sizeof(U) = sizeof(double) (often 8), not 1+4+8

1.2 Padding and “object representation”

Even if all members start at offset 0, the union’s storage can include padding bytes (e.g., due to alignment rounding). Those bytes matter if you do:

  • raw byte hashing
  • serialization
  • memcmp comparisons
  • “bit punning” experiments

2. “Only one member at a time” — what that means in C vs C++

2.1 C++: active member and lifetime are first-class language concepts

In C++, union members are objects, and (critically) their lifetime may or may not have begun depending on what you constructed last.

cppreference states the commonly taught rule: reading a member that wasn’t most recently written is undefined behavior. (Cppreference) But to be precise, the real core is: which member’s lifetime is active, and what the standard allows you to do when it isn’t.

A union member is “active” if its lifetime has begun and not ended. This is the conceptual model used by the standard wording cited in discussions like the Node.js issue quoting the standard’s union wording. (GitHub)

Key implications in C++:

  • If all members are trivial (e.g., int, float, pointers, trivially copyable aggregates), you can “switch” the active member by assignment, but the rules about what you may read remain constrained.
  • If a union has non-trivial members (e.g., std::string), you generally must:
    • explicitly construct the chosen member (placement new)
    • explicitly destroy it
    • manually track which member is active

2.2 C: no “active member” concept, but “effective type” and aliasing rules fill the role

C doesn’t have constructors/destructors or object lifetime rules in the C++ sense for union members. Instead, C’s model revolves around:

  • stored value
  • effective type
  • permitted ways to inspect object representations (notably through unsigned char)

This is where online explanations diverge, and where your attached document’s “always UB” statement can be misleading.

There’s a long-running tension:

  • Many sources say “type punning via unions is UB in C” (e.g., older standard interpretations and strict readings). (Stack Overflow)
  • Others argue it is (or has become, via defect reports / committee intent) well-defined or at least specified enough in C if you access through the union and accept implementation-defined aspects. Recent compiler community discussions still surface this difference between C and C++ behavior. (LLVM Discussion Forums)

Practical reality:

  • Compilers like GCC/Clang have historically supported union-based punning in many cases, but optimizers + strict aliasing can break assumptions, especially when you “escape” the union through pointers, LTO, or aggressive optimization. (Software is Crap)

So, for C, the safe guidance is:

  • Union punning might work, might be intended, might even be supported—but if you need portable, standards-solid behavior, treat it as risky and prefer byte-based techniques (memcpy) or standardized idioms.

3. The big minefield: type punning, aliasing, and “reading inactive members”

Let’s separate three distinct operations people conflate:

  1. Variant storage (store A or B, track which it is, read it back as the same type)
  2. Representation inspection (look at bytes of an object representation)
  3. Type punning (write as T, read as U, pretending it “reinterprets”)

Only (1) is the “intended” union use. (2) is often legal via character types. (3) is where you get wrecked.

3.1 C++: union type punning is not the portable tool you want

In C++, “read other member” is generally UB in the standard model; many compilers allow some patterns as extensions, but the language rule is not your friend here. cppreference is blunt about UB for reading the not-most-recently-written member. (Cppreference)

Do this instead in C++20+: std::bit_cast for same-size trivially copyable types.

  • The C++ Core Guidelines discussion explicitly recommends std::bit_cast (or memcpy) rather than unions or reinterpret_cast. (GitHub)
  • std::bit_cast requires trivially copyable types, which is tightly tied to the idea that this is a byte-level copy of object representation. (Stack Overflow)

Example:

#include <bit>
#include <cstdint>

float u32_to_float(std::uint32_t bits) {
    return std::bit_cast<float>(bits);
}

std::uint32_t float_to_u32(float x) {
    return std::bit_cast<std::uint32_t>(x);
}

This is the modern, optimizer-friendly solution.

If you’re pre-C++20: use std::memcpy.

3.2 C: union punning is controversial in “pure standard” terms, but common in practice

There’s evidence on both sides in the ecosystem:

  • Older strict readings: writing one union member then reading another is UB (StackOverflow answers often cite this). (Stack Overflow)
  • Compiler community conversations sometimes summarize: “in C this is well-defined; in C++ it is UB” (this is asserted in recent LLVM discourse). (LLVM Discussion Forums)
  • Many real-world codebases rely on GCC’s documented behavior/extension. (Software is Crap)

Portable C guidance (if you want to sleep at night across compilers and flags):

  • Prefer memcpy-based punning for reinterpretation.
  • Or restrict yourself to inspecting bytes through unsigned char (object representation).

3.3 The strict aliasing rule: why your pun “works” until it doesn’t

Optimizers assume that pointers/references of unrelated types do not alias, enabling reordering, vectorization, load/store elimination, etc. John Regehr has a good high-level explanation of why aliasing assumptions exist and how they enable optimizations. (Regehr Blog) And discussions on what optimizations aliasing enables are plentiful. (langdev.stackexchange.com)

This is the big gotcha:

float f(uint32_t *p) {
    float *q = (float*)p;
    *p = 0x3F800000u;
    return *q; // "should be 1.0f"? optimizer may say nope
}

This is classic aliasing UB territory. A union doesn’t automatically save you if you end up with typed pointers that violate aliasing assumptions.

3.4 “But I only access through the union variable itself!”

Even then, you need to be careful about what your compiler actually guarantees versus what the standard strictly requires. When you need portable behavior, use:

  • memcpy
  • std::bit_cast
  • explicit byte arrays

4. C++ special cases where “reading a different member” is not the naïve UB story

There are narrowly scoped exceptions / special allowances people trip over.

4.1 Common initial sequence (C++)

C++ allows certain limited inspection when the union members are standard-layout structs that share a common initial sequence. This is often mentioned in discussions about inactive union member access. (Stack Overflow)

Example pattern:

struct A { int tag; int x; };
struct B { int tag; double y; };

union U {
    A a;
    B b;
};

U u;
u.b = B{42, 3.14};

// Accessing u.a.tag may be permitted due to common initial sequence rules
int t = u.a.tag;

This is used in low-level ABI / tagged headers. But:

  • It only applies to the common initial sequence portion.
  • It does not generalize to “I can read any member as any type.”

4.2 Inspecting object representation via unsigned char / std::byte

In C++ you’re allowed to look at an object’s representation through character types (unsigned char, std::byte via std::as_bytes, etc.). That’s not the same as “reading another union member,” but it’s how you safely do “what are the bytes.”

4.3 “Inactive member” vs “no lifetime began” (non-trivial members)

In C++ with non-trivial members, the real hazard is not just “inactive”—it’s “this object doesn’t exist here right now.”

union U {
    int i;
    std::string s;
    U() : i(0) {}
    ~U() {}
};

U u;
u.s = "hi"; // actually UB: s was never constructed!

Correct:

new (&u.s) std::string("hi");
// ...
u.s.~basic_string();
u.i = 123; // now i becomes active (trivial)

This is why std::variant exists.


5. Initialization rules: C vs C++ differences you actually feel

5.1 C designated initializers

C99+ gives you designated initializers:

union U { int i; float f; };
union U u = { .f = 1.25f };

5.2 C++ union initialization

C++ has different aggregate initialization rules and (since C++11 onward) somewhat more nuanced behavior for unions as aggregates, plus lifetime rules.

A common modern pattern is to avoid raw unions in C++ unless:

  • all members are trivial, or
  • you’re implementing a low-level tagged union with explicit placement new/destruction, or
  • you’re doing ABI overlays under strict constraints.

6. Anonymous unions: power tool with sharp edges

Anonymous unions are useful for:

  • register overlays
  • protocol headers
  • representing data in multiple formats when you don’t pun types illegally

Example:

struct Color {
    union {
        struct { uint8_t r,g,b,a; };
        uint32_t rgba;
    };
};

This is extremely common. Is it portable? Depends what you do with it.

  • If you write rgba then read r..a, you are effectively doing a representation reinterpretation and will run into the same C vs C++ concerns as above.
  • If your goal is byte packing/unpacking, consider explicit shifts/masks or memcpy to/from an array.

7. Bit-fields inside unions

Bit-fields are often combined with unions for:

  • register definitions
  • flags packing

But beware:

  • Bit-field layout is implementation-defined (ordering, packing, alignment, etc.).
  • Endianness interacts with interpretation.
  • Compilers differ on packing and ABI rules.

If you need stable wire formats, you usually want explicit masks and shifts rather than bit-fields.


8. The “right” way to use unions: discriminated/tagged unions

8.1 In C: classic tagged union

typedef enum { T_INT, T_FLOAT } Tag;

typedef struct {
    Tag tag;
    union {
        int   i;
        float f;
    } u;
} Value;

Rules:

  • You must maintain the tag correctly.
  • You only read the member corresponding to the tag.

This is safe, idiomatic, and exactly what unions were meant for.

8.2 In C++: prefer std::variant unless you have a reason not to

std::variant is essentially a standardized, type-safe tagged union with correct lifetime handling.

The C++ Core Guidelines recommendation to use std::bit_cast for punning and avoid union-punning aligns with “prefer higher-level safe tools.” (GitHub)

Use a raw union in C++ when:

  • you must match a C ABI layout precisely, or
  • you need extreme control and can enforce invariants, or
  • you’re building a low-level primitive like variant yourself.

9. Performance: why unions are still relevant in 2025

Unions are still important because they enable:

  • No dynamic allocation for alternatives
  • Tight packing for embedded / IPC / message queues
  • Stable ABI overlays (with extreme care)
  • Manual object model control in kernel/RT code

But it’s also true that:

  • compilers are very good at optimizing memcpy-style punning away
  • std::variant can be extremely efficient (often a small tagged buffer + switch)

So “unions are faster” is not automatically true; “unions are lower-level” is true.


10. Practical patterns and recommended idioms

10.1 Safe bit reinterpretation (C++20): std::bit_cast

Already shown; it’s the gold standard now. (GitHub)

10.2 Safe bit reinterpretation (C and pre-C++20): memcpy

C:

uint32_t bits;
float f;
memcpy(&bits, &f, sizeof bits);

C++ pre-20:

uint32_t bits;
float f;
std::memcpy(&bits, &f, sizeof bits);

This avoids aliasing UB and is typically optimized into a register move.

10.3 Overlaying protocol headers: prefer explicit parsing

Instead of:

union {
    Header h;
    uint8_t bytes[sizeof(Header)];
} u;

Prefer:

  • memcpy from buffer into Header if Header is trivially copyable and layout is controlled
  • or manual parse with shifts to avoid padding/alignment traps

10.4 “Union as storage” with placement new (C++)

If you truly need non-trivial alternatives without std::variant, you’re basically writing your own variant:

  • add a tag
  • use placement new for construction
  • explicitly destroy active member
  • define copy/move carefully
  • ensure exception safety

At that point, you almost always should just use std::variant unless you have a special ABI/constexpr requirement.


11. Common misconceptions (and what to say instead)

Misconception A: “Reading any inactive union member is always UB in both C and C++”

Better:

  • In C++, that’s a good default mental model; there are narrow exceptions (common initial sequence), and you should prefer std::bit_cast/memcpy for punning. (Cppreference)
  • In C, union punning is widely used and supported by major compilers, but the “fully portable per strictest reading” story is complicated and historically contentious; prefer memcpy for maximum portability under aggressive optimization. (Stack Overflow)

Misconception B: “A union is for saving memory only”

Better:

  • Memory saving is the obvious reason, but unions are also about explicit control of representation, ABI overlays, and implementing sum types.

Misconception C: “A union is like reinterpret_cast

Better:

  • A union gives shared storage; it does not grant permission to violate aliasing/lifetime rules in C++.

12. What I would change / extend from your attached document

From your attached doc , the main improvements to make it “expert-grade” are:

  1. Be more nuanced about C vs C++ “type punning through unions.” Many guides state “UB everywhere,” but in practice and in some modern discussions, people distinguish C and C++ here. (LLVM Discussion Forums) The safest teaching is still: “don’t rely on union punning for portability,” but the article should acknowledge the reality and the controversy.
  2. Introduce C++’s “common initial sequence” carve-out as a legitimate special case. (Stack Overflow)
  3. Tie everything to object lifetime (especially for non-trivial union members) rather than only “last written member.” (GitHub)
  4. Make std::bit_cast the headline recommendation for C++20+ punning and demote unions for that use-case. (GitHub)

13. A quick decision table (what to use when)

  • C, you need a variant: tagged union (enum + union)
  • C, you need bit reinterpretation: memcpy (most portable)
  • C++17 or earlier, bit reinterpretation: memcpy
  • C++20+, bit reinterpretation: std::bit_cast (GitHub)
  • C++ any version, general sum type: std::variant (C++17+) if available
  • Protocol/wire formats: avoid union + bit-fields; parse explicitly
  • Hardware registers: unions/bit-fields can be OK if compiler/ABI is fixed and documented

If you want, paste a few real examples from your codebase (e.g., “I want to map bytes to a header,” “I want float/int punning,” “I want a tagged union with std::string”), and I’ll:

  • rewrite them into portable and optimizer-friendly forms,
  • and, if you do want to use unions, show the exact invariant checks and lifetime management patterns that keep it correct in C++20/23.

Is the following valid and portable C++?

struct Color {
    union {
        struct { uint8_t r,g,b,a; };
        uint32_t rgba;
    };
};

Short answer: it’s valid C++, but it is *not* fully portable nor strictly well-defined for what people usually intend.

Let’s unpack it carefully.


1. Is it valid C++ syntax?

Yes. This compiles on all major compilers:

struct Color {
    union {
        struct { uint8_t r,g,b,a; };
        uint32_t rgba;
    };
};

What’s going on:

  • Anonymous union --- allowed since C++11
  • Anonymous struct inside a union --- also allowed
  • Standard-layout types everywhere

So from a grammar and core-language point of view: valid C++.


2. Is it well-defined to read/write both views?

Here’s the crucial part.

Active union member rule

In C++, only one union member is active at a time. Reading from a member other than the active one is undefined behavior, except for a few narrow exceptions.

Your union members are:

struct { uint8_t r,g,b,a; };  // member #1
uint32_t rgba;               // member #2

If you do:

Color c{};
c.r = 255;
uint32_t x = c.rgba; // ❌ undefined behavior

or:

Color c{};
c.rgba = 0xff0000ff;
uint8_t red = c.r; // ❌ undefined behavior

That is not allowed by the C++ standard, even though it “works” on most machines.

Why?

  • These members do not share a common initial sequence
  • This is not a std::byte / char aliasing case
  • Type punning through unions is explicitly not guaranteed in C++

📌 This is the same pitfall as C-style union type punning --- still UB in standard C++.


3. Is the memory layout portable?

Even if you only access one member at a time, layout assumptions creep in.

Problems:

1. Endianness

rgba depends on byte order:

  • Little-endian: 0xAABBGGRR
  • Big-endian: 0xRRGGBBAA

No guarantee which byte maps to which channel.

2. Padding / alignment

While likely packed as 4 bytes, the standard does not require:

sizeof(uint32_t) == 4

(Yes, it’s almost always 4 --- but portability means guaranteed.)

3. Bit significance

Even if size is 4 bytes, the order of significance is not specified.


4. So what exactly is portable vs non-portable?

✅ Portable guarantees

  • The struct is valid
  • You may use either representation consistently
  • You may copy the object with memcpy
  • You may inspect the raw bytes via unsigned char

❌ Not portable / undefined

  • Writing via rgba and reading via r/g/b/a
  • Assuming channel order in uint32_t
  • Assuming little-endian layout
  • Assuming no padding (even though common)

5. When does this pattern “work anyway”?

On:

  • x86 / x86_64
  • ARM (little-endian)
  • GCC / Clang / MSVC

…it works because:

  • Little-endian
  • 8-bit bytes
  • 32-bit uint32_t
  • Compilers tolerate union punning

But this is implementation-defined behavior relying on de facto conventions, not the standard.

That’s fine for:

  • Graphics engines
  • Embedded systems you control
  • Serialization with explicit endianness

Not fine for:

  • Portable libraries
  • Public APIs
  • Standards-conforming core logic

6. How to make it fully well-defined?

Option A: Explicit packing via shifts (most portable)

struct Color {
    uint8_t r, g, b, a;

    uint32_t rgba() const {
        return (uint32_t(r) << 24) |
               (uint32_t(g) << 16) |
               (uint32_t(b) << 8)  |
               uint32_t(a);
    }
};

✔ Fully defined ✔ Endianness-independent ✔ Optimizes well


Option B: std::bit_cast (C++20, controlled layout)

struct RGBA {
    uint8_t r, g, b, a;
};

uint32_t to_u32(RGBA c) {
    static_assert(sizeof(RGBA) == sizeof(uint32_t));
    return std::bit_cast<uint32_t>(c);
}

⚠ Still endianness-dependent ✔ Strict aliasing-safe ✔ Standard-compliant


Option C: memcpy (classic safe punning)

uint32_t rgba;
std::memcpy(&rgba, &color, sizeof(rgba));

✔ Well-defined ✔ Portable ✔ Ugly but correct


7. Verdict

Aspect Verdict
Compiles ✅ Yes
Valid C++ ✅ Yes
Strictly well-defined ❌ No
Portable across architectures ❌ No
Commonly used in practice ✅ Yes

Final take

Your code is valid C++, but using it for type-punning is undefined behavior and not portable.

If this is for engine code, shaders, or low-level systems you control, it’s a reasonable pragmatic choice.

If this is for a reusable library or public API, don’t do it.

If you want, I can:

  • Show how compilers actually optimize the safe versions
  • Compare generated assembly
  • Propose a zero-overhead, standards-clean alternative that still feels ergonomic

“Memory saving is the obvious reason, but unions are also about explicit control of representation, ABI overlays, and implementing sum types.”

Let’s unpack that sentence layer by layer, because each clause corresponds to a distinct, deep use-case for unions that goes well beyond “saving memory.” I’ll treat this as a mini-article and connect each idea to real systems programming patterns, language rules, and why unions still exist even in modern C++.


1. “Memory saving is the obvious reason” (baseline, but incomplete)

This is the part everyone learns first:

union {
    int   i;
    float f;
};

Only one member exists at a time, so the union occupies the size of the largest member.

Why this explanation is insufficient

Memory saving is:

  • accidental, not fundamental
  • not always significant on modern systems
  • not the reason unions exist in language standards

In fact:

  • std::variant also stores alternatives in overlapping storage
  • many union uses don’t save memory at all (largest member dominates)
  • some union uses increase complexity and bugs rather than saving space

So why do unions exist at all?

Because they give you explicit control over object representation and interpretation.

That’s where the real power—and danger—lies.


2. “Explicit control of representation” (the real core concept)

What “representation” means here

In C and C++, representation refers to:

  • object layout
  • byte-level storage
  • alignment
  • bit patterns
  • how values exist in memory

A union lets you say:

“This block of memory is one of these representations, and I control which interpretation is valid.”

That is fundamentally different from:

  • inheritance
  • polymorphism
  • overloading
  • templates

Those operate at the type system level.

Unions operate at the object model and memory level.


2.1 Representation vs value

Consider this distinction:

Concept Example
Value 42, 3.14f, "hello"
Representation bytes, bits, layout in memory

Unions let you reinterpret the same representation under different meanings.

This matters when:

  • interacting with hardware
  • parsing binary protocols
  • crossing language or ABI boundaries
  • implementing language runtimes
  • writing serializers/deserializers
  • handling NaNs, bit flags, tagged pointers

Example (conceptual):

union Word {
    uint32_t raw;
    struct {
        uint32_t opcode : 6;
        uint32_t reg    : 5;
        uint32_t imm    : 21;
    } decoded;
};

Here the union is not about saving memory.

It is about:

  • one physical 32-bit word
  • multiple semantic interpretations
  • you controlling which interpretation applies

2.2 Why this is intentionally low-level

Unions deliberately bypass:

  • automatic lifetime tracking
  • type safety
  • aliasing safety
  • constructor/destructor enforcement

That’s not a bug.

That’s why:

  • operating systems use unions
  • compilers use unions
  • virtual machines use unions
  • embedded firmware uses unions

You’re stepping outside the “safe abstraction zone” and taking responsibility.


3. “ABI overlays” (this is huge, and often misunderstood)

What is an ABI?

ABI = Application Binary Interface

It defines:

  • object layout
  • calling conventions
  • alignment
  • padding
  • endianness expectations
  • struct layout rules

If you get ABI wrong:

  • libraries crash
  • syscalls misbehave
  • IPC breaks
  • hardware registers misfire

Unions are one of the few tools that let you intentionally overlay different ABI-compatible views onto the same memory.


3.1 ABI overlay: same bytes, different meanings

Classic example: POSIX networking.

struct sockaddr {
    sa_family_t sa_family;
    char        sa_data[14];
};

struct sockaddr_in {
    sa_family_t    sin_family;
    uint16_t       sin_port;
    struct in_addr sin_addr;
};

union sockaddr_any {
    struct sockaddr     sa;
    struct sockaddr_in  in;
    struct sockaddr_in6 in6;
};

This is not about memory saving.

This is about:

  • identical ABI prefixes
  • safe casting at call boundaries
  • allowing generic APIs to accept specific layouts

The union makes it explicit:

“These types are ABI-compatible overlays of the same memory.”


3.2 Hardware registers (canonical ABI overlay)

union ControlRegister {
    uint32_t raw;
    struct {
        uint32_t enable : 1;
        uint32_t mode   : 3;
        uint32_t irq    : 1;
        uint32_t        : 27;
    };
};

Here:

  • raw is how hardware sees the register
  • the bitfield view is how humans reason about it

This is precise representation control, not polymorphism.

⚠️ You accept:

  • implementation-defined bitfield layout
  • fixed compiler + target assumptions

That’s the deal you make.


3.3 Why reinterpret_cast is not the same thing

reinterpret_cast<T*>:

  • lies to the type system
  • violates aliasing rules
  • does not create an object
  • often breaks under optimization

A union:

  • allocates storage correctly
  • guarantees alignment
  • makes aliasing explicit
  • is recognized by compilers as a representation boundary

This is why low-level code prefers unions over casts.


4. “Implementing sum types” (the conceptual leap)

This is the most language-design–oriented role of unions.

What is a sum type?

A sum type (aka tagged union, variant type, algebraic data type) means:

A value is exactly one of several alternatives, each with its own type.

Examples:

  • JSON values (null | bool | number | string | array | object)
  • AST nodes in a compiler
  • network messages
  • error-or-value results

4.1 C-style tagged union (manual sum type)

typedef enum {
    VAL_INT,
    VAL_FLOAT,
    VAL_STRING
} ValueKind;

typedef struct {
    ValueKind kind;
    union {
        int    i;
        float  f;
        char  *s;
    };
} Value;

This is:

  • explicit
  • zero-overhead
  • fully deterministic
  • ABI-stable

And:

  • the union is not about saving memory
  • the tag enforces meaning
  • the union enforces exclusivity

This is the original sum-type pattern in C.


4.2 C++ raw union vs std::variant

In C++, the same idea exists, but with more safety:

std::variant<int, float, std::string>

Internally:

  • it uses union-like storage
  • tracks the active alternative
  • manages lifetimes
  • enforces access rules

So why would anyone still use raw unions?

Because:

  • std::variant is not ABI-stable
  • it may add padding, tags, helpers
  • it may not match C layouts
  • it may not be usable in freestanding / kernel / embedded environments
  • it may not be constexpr-friendly enough for some use cases

Thus raw unions remain essential for manual sum types with strict control.


4.3 Key insight: unions express exclusivity

A struct says:

“These things all exist together.”

A union says:

“Exactly one of these exists, and I will tell you which.”

That semantic idea—exclusive alternatives—is independent of memory saving.

Memory saving is just the physical manifestation.


5. Unifying the three ideas

Let’s restate the sentence with precision:

Memory saving is incidental; unions exist to let programmers explicitly define mutually exclusive interpretations of the same storage, enabling representation-level control, ABI-compatible overlays, and low-level implementations of sum types.

Each part reinforces the others:

Concept What unions give you
Representation control Exact byte/bit interpretation
ABI overlays Multiple standardized views of same memory
Sum types Explicit “one-of-many” semantics
Memory saving Consequence, not goal

6. Why this matters for modern C++ (especially your interests)

Given your background (systems C++, networking, sockets, ABI-aware code):

  • You already deal with protocol layouts
  • You already rely on stable binary representations
  • You already need precise lifetime control
  • You already care about optimization correctness

Understanding unions as representation tools rather than memory hacks is critical to:

  • writing portable low-level code
  • avoiding aliasing UB
  • designing clean tagged abstractions
  • knowing when not to use them

7. Rule of thumb (expert-level)

Use unions when you want to control how memory is interpreted. Use structs when you want to model composition. Use std::variant when you want sum types with safety. Use std::bit_cast when you want representation conversion.

If you want, next we can:

  • dissect a real-world protocol struct
  • redesign a union-based API into a variant-based one
  • analyze compiler output for union vs memcpy
  • or map this directly to network headers, sockets, or ABI-stable libraries

Just tell me where you want to go deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment