Disclaimer: Grok generated document.
Unions are a fundamental feature in both C and C++ for defining data types that can store different types of data in the same memory location. They are particularly useful for memory optimization, type punning (with caveats), and implementing variant-like structures. This guide covers everything from basics to advanced topics, including syntax, semantics, differences between C and C++, pitfalls, use cases, and modern alternatives. I'll use code examples, tables for comparisons, and explanations grounded in language standards (up to C23 and C++23 as of 2025).
A union is a user-defined type where all members share the same memory address. Unlike a struct (which allocates separate space for each member), a union overlays members, so its size is determined by the largest member. Only one member can hold a valid value at a time—accessing others leads to undefined behavior (UB).
- Purpose: Save memory when mutually exclusive data is needed (e.g., a value that could be an int or a float).
- Key Rule: The "active member" is the last one written to. Reading from others is UB.
- Standards Origin: Introduced in early C (K&R C), formalized in C89. C++ inherits from C but adds object-oriented complexities.
Unions are not type-safe by design, making them error-prone but powerful for low-level programming.
In both C and C++:
union MyUnion {
int i;
float f;
char c[4]; // Array member (size 4 bytes)
};- Size:
sizeof(MyUnion)is the size of the largest member (e.g., 4 bytes if int/float are 4 bytes). - Instantiation:
union MyUnion u;or justMyUnion u;in C++ (tag optional in some contexts).
- In C++ (since C++11, but common earlier): Unions without a name, embedded in structs for overlaying.
- In C: Supported in C11 onward (anonymous unions in structs).
Example (C++):
struct Point {
union {
int x;
float fx; // Overlays x
};
int y;
};
Point p;
p.x = 42; // Same as p.fx (but interpret as float = UB)In C (C11+):
struct Point {
union {
int x;
float fx;
};
};Common in both:
typedef union {
int i;
float f;
} Variant;
Variant v;- Memory Overlap: All members start at offset 0.
- Padding/Alignment: Unions follow the alignment of the most strictly aligned member. Padding may be added to reach that.
- Size Calculation:
sizeof(union) = max(sizeof(each member)), rounded up for alignment.
Example Table of Layout (assuming 4-byte int/float, 1-byte char):
| Member | Type | Size | Alignment | Offset |
|---|---|---|---|---|
| i | int | 4 | 4 | 0 |
| f | float | 4 | 4 | 0 |
| c | char[4] | 4 | 1 | 0 |
- Bit-Fields in Unions: Allowed in both C and C++. Useful for packing bits.
union Flags { unsigned int value : 32; struct { unsigned int flagA : 1; unsigned int flagB : 1; // ... up to 32 bits }; };
- In C: Bit-fields can span bytes; unnamed fields for padding.
- In C++: Similar, but classes can have bit-fields too.
C unions are straightforward, mainly for primitives/arrays/pointers.
union MyUnion u = { .i = 42 }; // C99+ designated initializer
u.f = 3.14; // Now f is active, i is invalid
printf("%f\n", u.f); // 3.14
// printf("%d\n", u.i); // UB! (though may print garbage)- Type Punning: Reading as a different type (e.g., int as float) is common but technically UB in strict aliasing rules (C99+). Use
memcpyfor safety. - Anonymous Unions in C11+: As above, for structs.
- Restrictions: No constructors/destructors (C has no classes). Members can't have non-trivial types.
- Endian Detection:
union { unsigned int i; unsigned char c[4]; } endian = { .i = 0x01020304 }; if (endian.c[0] == 0x04) { /* Little-endian */ }
- Variant Storage: Manually track type with an enum.
C++ builds on C but introduces challenges with classes/structs.
Same as C for POD (Plain Old Data) types.
- Allowed Since: C++98, but with rules.
- Constructors/Destructors: Unions don't call ctors/dtors automatically (except in C++11+ for default-init).
- Restrictions (C++11+):
- If a union has a non-trivial member (e.g., std::string with ctor), it can't have default ctor/copy/move unless you provide them.
- Only one member can be active; switching requires manual destruction and placement new.
Example:
#include <string>
#include <new> // For placement new
union Data {
int i;
std::string str;
Data() {} // Must provide if str is non-trivial
~Data() {} // Must provide
void setString(const std::string& s) {
new (&str) std::string(s); // Construct in place
}
void destroyString() {
str.~std::string(); // Explicit dtor
}
};- C++11 Improvements: Unions can have static members, virtual functions (rarely useful).
- C++17/20: Better support for unions in variants (see below).
- Can be used globally (unnamed namespace) or in classes.
- Useful for aliasing members.
Not built-in; implement manually for safety.
In C:
enum Type { INT, FLOAT };
struct Variant {
enum Type type;
union {
int i;
float f;
} data;
};In C++: Same, or use std::variant (C++17+).
| Aspect | Union | Struct |
|---|---|---|
| Memory | Overlaid (shared) | Sequential (separate) |
| Size | Largest member | Sum of members + padding |
| Access | One active member | All members always accessible |
| Use Case | Alternatives | Composites |
| Initialization | One member at a time | All members |
| C++ Complexity | Manual ctors/dtors for objects | Automatic |
| Feature | C | C++ |
|---|---|---|
| Anonymous Unions | C11+ (in structs) | Yes (earlier) |
| Non-Trivial Members | No (no classes) | Yes, with manual management |
| Static Members | No | Yes (C++11+) |
| Virtual Functions | No | Yes (but unusual) |
| Type Punning | UB under strict aliasing | Same + std::bit_cast (C++20) |
| Standard Alternatives | None | std::variant (C++17) |
- C23 Additions: Better support for bit-precise integers in unions.
- C++23: Minor refinements, like better constexpr support.
- Accessing Inactive Member: Always UB. May work in practice but not portable.
- Strict Aliasing Violation: Punning types (e.g., int as float) breaks rules unless via char* or memcpy.
- Alignment Issues: Misaligned access crashes on some architectures.
- Lifetime Management in C++: Forgetting to call dtor on objects leads to leaks/UB.
- Initializer Rules: In C99+, designated; in C++, aggregate init for POD unions.
- Portability: Endianness, padding vary by platform.
Avoid unions for type punning; use std::bit_cast (C++20) or memcpy.
- Memory Optimization: In embedded systems (e.g., overlaying registers).
- Variant Types: Before
std::variant, unions + enum. - Networking/Parsing: Overlay headers on packets.
- Graphics: Color unions (e.g., RGBA as uint32 or separate bytes).
- Debugging: Inspect memory representations.
-
C++17 std::variant: Safe, automatic management.
#include <variant> std::variant<int, float, std::string> v = 42; if (std::holds_alternative<int>(v)) { std::cout << std::get<int>(v) << std::endl; } v = "hello"; // Auto-destroys int
- Visits with
std::visitfor type-safe handling.
- Visits with
-
std::any (C++17): For unknown types.
-
Boost.Variant (pre-C++17).
-
In C: Libraries like GLib's GVariant.
- Pros: Zero overhead; pure memory overlay.
- Cons: Manual tracking adds code complexity/runtime checks.
- Unions are POD, so memcpy-able and efficient in arrays.
Unions are a low-level tool for memory sharing, essential in systems programming but risky due to UB potential. In C, stick to primitives and manual tagging. In C++, prefer std::variant for safety unless performance demands raw unions (e.g., in kernels or games).
- When to Use: For legacy code, extreme optimization, or C compatibility.
- Best Practice: Always pair with an enum for type tracking; use placement new/dtors in C++.
- Learn More: Refer to C standards (ISO/IEC 9899) and C++ (ISO/IEC 14882) for edge cases.
If you have a specific example or scenario, I can elaborate further! 🚀
Below is a full, deep-dive “article-style” treatment of unions in C and C++, with extra attention to the places where explanations on the internet (and in many summaries) tend to be subtly wrong, incomplete, or “works-on-my-compiler”-dependent. I’ll also explicitly call out a couple of points from your attached document that are commonly stated but need nuance, especially around type punning and “reading inactive members.”
Unions in C and C++: A Comprehensive Deep Dive (Semantics, Lifetime, Aliasing, and Real-World Patterns)
A union is a type whose non-static data members share the same storage. Intuitively:
- A
structis composition: members live next to each other. - A
unionis overlay: members live on top of each other.
Two immediate consequences:
sizeof(union)is at least the size of its largest member (plus possible padding for alignment).- A union object has one chunk of storage, and member access is a typed view onto that storage.
The tricky parts are not about layout—they’re about:
- Which member is “active” (C++ concept, and quasi-concept in C via effective type rules).
- Object lifetime (particularly in C++).
- Aliasing / effective type / optimization assumptions (both C and C++).
Your attached document gives a broadly good overview, but it leans into a too-simple rule—“reading from a member other than the last one written is UB”—which is often practically useful advice yet is not the whole story across C vs C++ and across certain special cases.
For a union:
- All non-static data members have the same address in C++ (and, in practice and by modern specs, effectively in C as well). cppreference summarizes this expectation for C++ unions. (Cppreference)
- The union’s alignment is the maximum alignment requirement among its members.
sizeof(union)is large enough for the largest member, rounded up to satisfy alignment.
Example:
union U {
uint8_t b;
uint32_t w;
double d;
};Typical (but platform-dependent):
alignof(U)=alignof(double)sizeof(U)=sizeof(double)(often 8), not 1+4+8
Even if all members start at offset 0, the union’s storage can include padding bytes (e.g., due to alignment rounding). Those bytes matter if you do:
- raw byte hashing
- serialization
- memcmp comparisons
- “bit punning” experiments
In C++, union members are objects, and (critically) their lifetime may or may not have begun depending on what you constructed last.
cppreference states the commonly taught rule: reading a member that wasn’t most recently written is undefined behavior. (Cppreference) But to be precise, the real core is: which member’s lifetime is active, and what the standard allows you to do when it isn’t.
A union member is “active” if its lifetime has begun and not ended. This is the conceptual model used by the standard wording cited in discussions like the Node.js issue quoting the standard’s union wording. (GitHub)
- If all members are trivial (e.g.,
int,float, pointers, trivially copyable aggregates), you can “switch” the active member by assignment, but the rules about what you may read remain constrained. - If a union has non-trivial members (e.g.,
std::string), you generally must:- explicitly construct the chosen member (placement
new) - explicitly destroy it
- manually track which member is active
- explicitly construct the chosen member (placement
C doesn’t have constructors/destructors or object lifetime rules in the C++ sense for union members. Instead, C’s model revolves around:
- stored value
- effective type
- permitted ways to inspect object representations (notably through
unsigned char)
This is where online explanations diverge, and where your attached document’s “always UB” statement can be misleading.
There’s a long-running tension:
- Many sources say “type punning via unions is UB in C” (e.g., older standard interpretations and strict readings). (Stack Overflow)
- Others argue it is (or has become, via defect reports / committee intent) well-defined or at least specified enough in C if you access through the union and accept implementation-defined aspects. Recent compiler community discussions still surface this difference between C and C++ behavior. (LLVM Discussion Forums)
Practical reality:
- Compilers like GCC/Clang have historically supported union-based punning in many cases, but optimizers + strict aliasing can break assumptions, especially when you “escape” the union through pointers, LTO, or aggressive optimization. (Software is Crap)
So, for C, the safe guidance is:
- Union punning might work, might be intended, might even be supported—but if you need portable, standards-solid behavior, treat it as risky and prefer byte-based techniques (
memcpy) or standardized idioms.
Let’s separate three distinct operations people conflate:
- Variant storage (store A or B, track which it is, read it back as the same type)
- Representation inspection (look at bytes of an object representation)
- Type punning (write as T, read as U, pretending it “reinterprets”)
Only (1) is the “intended” union use. (2) is often legal via character types. (3) is where you get wrecked.
In C++, “read other member” is generally UB in the standard model; many compilers allow some patterns as extensions, but the language rule is not your friend here. cppreference is blunt about UB for reading the not-most-recently-written member. (Cppreference)
Do this instead in C++20+: std::bit_cast for same-size trivially copyable types.
- The C++ Core Guidelines discussion explicitly recommends
std::bit_cast(ormemcpy) rather than unions orreinterpret_cast. (GitHub) std::bit_castrequires trivially copyable types, which is tightly tied to the idea that this is a byte-level copy of object representation. (Stack Overflow)
Example:
#include <bit>
#include <cstdint>
float u32_to_float(std::uint32_t bits) {
return std::bit_cast<float>(bits);
}
std::uint32_t float_to_u32(float x) {
return std::bit_cast<std::uint32_t>(x);
}This is the modern, optimizer-friendly solution.
If you’re pre-C++20: use std::memcpy.
There’s evidence on both sides in the ecosystem:
- Older strict readings: writing one union member then reading another is UB (StackOverflow answers often cite this). (Stack Overflow)
- Compiler community conversations sometimes summarize: “in C this is well-defined; in C++ it is UB” (this is asserted in recent LLVM discourse). (LLVM Discussion Forums)
- Many real-world codebases rely on GCC’s documented behavior/extension. (Software is Crap)
Portable C guidance (if you want to sleep at night across compilers and flags):
- Prefer
memcpy-based punning for reinterpretation. - Or restrict yourself to inspecting bytes through
unsigned char(object representation).
Optimizers assume that pointers/references of unrelated types do not alias, enabling reordering, vectorization, load/store elimination, etc. John Regehr has a good high-level explanation of why aliasing assumptions exist and how they enable optimizations. (Regehr Blog) And discussions on what optimizations aliasing enables are plentiful. (langdev.stackexchange.com)
This is the big gotcha:
float f(uint32_t *p) {
float *q = (float*)p;
*p = 0x3F800000u;
return *q; // "should be 1.0f"? optimizer may say nope
}This is classic aliasing UB territory. A union doesn’t automatically save you if you end up with typed pointers that violate aliasing assumptions.
Even then, you need to be careful about what your compiler actually guarantees versus what the standard strictly requires. When you need portable behavior, use:
memcpystd::bit_cast- explicit byte arrays
There are narrowly scoped exceptions / special allowances people trip over.
C++ allows certain limited inspection when the union members are standard-layout structs that share a common initial sequence. This is often mentioned in discussions about inactive union member access. (Stack Overflow)
Example pattern:
struct A { int tag; int x; };
struct B { int tag; double y; };
union U {
A a;
B b;
};
U u;
u.b = B{42, 3.14};
// Accessing u.a.tag may be permitted due to common initial sequence rules
int t = u.a.tag;This is used in low-level ABI / tagged headers. But:
- It only applies to the common initial sequence portion.
- It does not generalize to “I can read any member as any type.”
In C++ you’re allowed to look at an object’s representation through character types (unsigned char, std::byte via std::as_bytes, etc.). That’s not the same as “reading another union member,” but it’s how you safely do “what are the bytes.”
In C++ with non-trivial members, the real hazard is not just “inactive”—it’s “this object doesn’t exist here right now.”
union U {
int i;
std::string s;
U() : i(0) {}
~U() {}
};
U u;
u.s = "hi"; // actually UB: s was never constructed!Correct:
new (&u.s) std::string("hi");
// ...
u.s.~basic_string();
u.i = 123; // now i becomes active (trivial)This is why std::variant exists.
C99+ gives you designated initializers:
union U { int i; float f; };
union U u = { .f = 1.25f };C++ has different aggregate initialization rules and (since C++11 onward) somewhat more nuanced behavior for unions as aggregates, plus lifetime rules.
A common modern pattern is to avoid raw unions in C++ unless:
- all members are trivial, or
- you’re implementing a low-level tagged union with explicit placement new/destruction, or
- you’re doing ABI overlays under strict constraints.
Anonymous unions are useful for:
- register overlays
- protocol headers
- representing data in multiple formats when you don’t pun types illegally
Example:
struct Color {
union {
struct { uint8_t r,g,b,a; };
uint32_t rgba;
};
};This is extremely common. Is it portable? Depends what you do with it.
- If you write
rgbathen readr..a, you are effectively doing a representation reinterpretation and will run into the same C vs C++ concerns as above. - If your goal is byte packing/unpacking, consider explicit shifts/masks or
memcpyto/from an array.
Bit-fields are often combined with unions for:
- register definitions
- flags packing
But beware:
- Bit-field layout is implementation-defined (ordering, packing, alignment, etc.).
- Endianness interacts with interpretation.
- Compilers differ on packing and ABI rules.
If you need stable wire formats, you usually want explicit masks and shifts rather than bit-fields.
typedef enum { T_INT, T_FLOAT } Tag;
typedef struct {
Tag tag;
union {
int i;
float f;
} u;
} Value;Rules:
- You must maintain the tag correctly.
- You only read the member corresponding to the tag.
This is safe, idiomatic, and exactly what unions were meant for.
std::variant is essentially a standardized, type-safe tagged union with correct lifetime handling.
The C++ Core Guidelines recommendation to use std::bit_cast for punning and avoid union-punning aligns with “prefer higher-level safe tools.” (GitHub)
Use a raw union in C++ when:
- you must match a C ABI layout precisely, or
- you need extreme control and can enforce invariants, or
- you’re building a low-level primitive like
variantyourself.
Unions are still important because they enable:
- No dynamic allocation for alternatives
- Tight packing for embedded / IPC / message queues
- Stable ABI overlays (with extreme care)
- Manual object model control in kernel/RT code
But it’s also true that:
- compilers are very good at optimizing
memcpy-style punning away std::variantcan be extremely efficient (often a small tagged buffer + switch)
So “unions are faster” is not automatically true; “unions are lower-level” is true.
Already shown; it’s the gold standard now. (GitHub)
C:
uint32_t bits;
float f;
memcpy(&bits, &f, sizeof bits);C++ pre-20:
uint32_t bits;
float f;
std::memcpy(&bits, &f, sizeof bits);This avoids aliasing UB and is typically optimized into a register move.
Instead of:
union {
Header h;
uint8_t bytes[sizeof(Header)];
} u;Prefer:
memcpyfrom buffer intoHeaderifHeaderis trivially copyable and layout is controlled- or manual parse with shifts to avoid padding/alignment traps
If you truly need non-trivial alternatives without std::variant, you’re basically writing your own variant:
- add a tag
- use placement new for construction
- explicitly destroy active member
- define copy/move carefully
- ensure exception safety
At that point, you almost always should just use std::variant unless you have a special ABI/constexpr requirement.
Better:
- In C++, that’s a good default mental model; there are narrow exceptions (common initial sequence), and you should prefer
std::bit_cast/memcpyfor punning. (Cppreference) - In C, union punning is widely used and supported by major compilers, but the “fully portable per strictest reading” story is complicated and historically contentious; prefer
memcpyfor maximum portability under aggressive optimization. (Stack Overflow)
Better:
- Memory saving is the obvious reason, but unions are also about explicit control of representation, ABI overlays, and implementing sum types.
Better:
- A union gives shared storage; it does not grant permission to violate aliasing/lifetime rules in C++.
From your attached doc , the main improvements to make it “expert-grade” are:
- Be more nuanced about C vs C++ “type punning through unions.” Many guides state “UB everywhere,” but in practice and in some modern discussions, people distinguish C and C++ here. (LLVM Discussion Forums) The safest teaching is still: “don’t rely on union punning for portability,” but the article should acknowledge the reality and the controversy.
- Introduce C++’s “common initial sequence” carve-out as a legitimate special case. (Stack Overflow)
- Tie everything to object lifetime (especially for non-trivial union members) rather than only “last written member.” (GitHub)
- Make
std::bit_castthe headline recommendation for C++20+ punning and demote unions for that use-case. (GitHub)
- C, you need a variant: tagged union (
enum+ union) - C, you need bit reinterpretation:
memcpy(most portable) - C++17 or earlier, bit reinterpretation:
memcpy - C++20+, bit reinterpretation:
std::bit_cast(GitHub) - C++ any version, general sum type:
std::variant(C++17+) if available - Protocol/wire formats: avoid union + bit-fields; parse explicitly
- Hardware registers: unions/bit-fields can be OK if compiler/ABI is fixed and documented
If you want, paste a few real examples from your codebase (e.g., “I want to map bytes to a header,” “I want float/int punning,” “I want a tagged union with std::string”), and I’ll:
- rewrite them into portable and optimizer-friendly forms,
- and, if you do want to use unions, show the exact invariant checks and lifetime management patterns that keep it correct in C++20/23.
struct Color {
union {
struct { uint8_t r,g,b,a; };
uint32_t rgba;
};
};Short answer: it’s valid C++, but it is *not* fully portable nor strictly well-defined for what people usually intend.
Let’s unpack it carefully.
Yes. This compiles on all major compilers:
struct Color {
union {
struct { uint8_t r,g,b,a; };
uint32_t rgba;
};
};What’s going on:
- Anonymous union --- allowed since C++11
- Anonymous struct inside a union --- also allowed
- Standard-layout types everywhere
So from a grammar and core-language point of view: valid C++.
Here’s the crucial part.
In C++, only one union member is active at a time. Reading from a member other than the active one is undefined behavior, except for a few narrow exceptions.
Your union members are:
struct { uint8_t r,g,b,a; }; // member #1
uint32_t rgba; // member #2If you do:
Color c{};
c.r = 255;
uint32_t x = c.rgba; // ❌ undefined behavioror:
Color c{};
c.rgba = 0xff0000ff;
uint8_t red = c.r; // ❌ undefined behaviorThat is not allowed by the C++ standard, even though it “works” on most machines.
Why?
- These members do not share a common initial sequence
- This is not a
std::byte/charaliasing case - Type punning through unions is explicitly not guaranteed in C++
📌 This is the same pitfall as C-style union type punning --- still UB in standard C++.
Even if you only access one member at a time, layout assumptions creep in.
rgba depends on byte order:
- Little-endian:
0xAABBGGRR - Big-endian:
0xRRGGBBAA
No guarantee which byte maps to which channel.
While likely packed as 4 bytes, the standard does not require:
sizeof(uint32_t) == 4(Yes, it’s almost always 4 --- but portability means guaranteed.)
Even if size is 4 bytes, the order of significance is not specified.
- The struct is valid
- You may use either representation consistently
- You may copy the object with
memcpy - You may inspect the raw bytes via
unsigned char
- Writing via
rgbaand reading viar/g/b/a - Assuming channel order in
uint32_t - Assuming little-endian layout
- Assuming no padding (even though common)
On:
- x86 / x86_64
- ARM (little-endian)
- GCC / Clang / MSVC
…it works because:
- Little-endian
- 8-bit bytes
- 32-bit
uint32_t - Compilers tolerate union punning
But this is implementation-defined behavior relying on de facto conventions, not the standard.
That’s fine for:
- Graphics engines
- Embedded systems you control
- Serialization with explicit endianness
Not fine for:
- Portable libraries
- Public APIs
- Standards-conforming core logic
struct Color {
uint8_t r, g, b, a;
uint32_t rgba() const {
return (uint32_t(r) << 24) |
(uint32_t(g) << 16) |
(uint32_t(b) << 8) |
uint32_t(a);
}
};✔ Fully defined ✔ Endianness-independent ✔ Optimizes well
struct RGBA {
uint8_t r, g, b, a;
};
uint32_t to_u32(RGBA c) {
static_assert(sizeof(RGBA) == sizeof(uint32_t));
return std::bit_cast<uint32_t>(c);
}⚠ Still endianness-dependent ✔ Strict aliasing-safe ✔ Standard-compliant
uint32_t rgba;
std::memcpy(&rgba, &color, sizeof(rgba));✔ Well-defined ✔ Portable ✔ Ugly but correct
| Aspect | Verdict |
|---|---|
| Compiles | ✅ Yes |
| Valid C++ | ✅ Yes |
| Strictly well-defined | ❌ No |
| Portable across architectures | ❌ No |
| Commonly used in practice | ✅ Yes |
Your code is valid C++, but using it for type-punning is undefined behavior and not portable.
If this is for engine code, shaders, or low-level systems you control, it’s a reasonable pragmatic choice.
If this is for a reusable library or public API, don’t do it.
If you want, I can:
- Show how compilers actually optimize the safe versions
- Compare generated assembly
- Propose a zero-overhead, standards-clean alternative that still feels ergonomic
“Memory saving is the obvious reason, but unions are also about explicit control of representation, ABI overlays, and implementing sum types.”
Let’s unpack that sentence layer by layer, because each clause corresponds to a distinct, deep use-case for unions that goes well beyond “saving memory.” I’ll treat this as a mini-article and connect each idea to real systems programming patterns, language rules, and why unions still exist even in modern C++.
This is the part everyone learns first:
union {
int i;
float f;
};Only one member exists at a time, so the union occupies the size of the largest member.
Memory saving is:
- accidental, not fundamental
- not always significant on modern systems
- not the reason unions exist in language standards
In fact:
std::variantalso stores alternatives in overlapping storage- many union uses don’t save memory at all (largest member dominates)
- some union uses increase complexity and bugs rather than saving space
So why do unions exist at all?
Because they give you explicit control over object representation and interpretation.
That’s where the real power—and danger—lies.
In C and C++, representation refers to:
- object layout
- byte-level storage
- alignment
- bit patterns
- how values exist in memory
A union lets you say:
“This block of memory is one of these representations, and I control which interpretation is valid.”
That is fundamentally different from:
- inheritance
- polymorphism
- overloading
- templates
Those operate at the type system level.
Unions operate at the object model and memory level.
Consider this distinction:
| Concept | Example |
|---|---|
| Value | 42, 3.14f, "hello" |
| Representation | bytes, bits, layout in memory |
Unions let you reinterpret the same representation under different meanings.
This matters when:
- interacting with hardware
- parsing binary protocols
- crossing language or ABI boundaries
- implementing language runtimes
- writing serializers/deserializers
- handling NaNs, bit flags, tagged pointers
Example (conceptual):
union Word {
uint32_t raw;
struct {
uint32_t opcode : 6;
uint32_t reg : 5;
uint32_t imm : 21;
} decoded;
};Here the union is not about saving memory.
It is about:
- one physical 32-bit word
- multiple semantic interpretations
- you controlling which interpretation applies
Unions deliberately bypass:
- automatic lifetime tracking
- type safety
- aliasing safety
- constructor/destructor enforcement
That’s not a bug.
That’s why:
- operating systems use unions
- compilers use unions
- virtual machines use unions
- embedded firmware uses unions
You’re stepping outside the “safe abstraction zone” and taking responsibility.
ABI = Application Binary Interface
It defines:
- object layout
- calling conventions
- alignment
- padding
- endianness expectations
- struct layout rules
If you get ABI wrong:
- libraries crash
- syscalls misbehave
- IPC breaks
- hardware registers misfire
Unions are one of the few tools that let you intentionally overlay different ABI-compatible views onto the same memory.
Classic example: POSIX networking.
struct sockaddr {
sa_family_t sa_family;
char sa_data[14];
};
struct sockaddr_in {
sa_family_t sin_family;
uint16_t sin_port;
struct in_addr sin_addr;
};
union sockaddr_any {
struct sockaddr sa;
struct sockaddr_in in;
struct sockaddr_in6 in6;
};This is not about memory saving.
This is about:
- identical ABI prefixes
- safe casting at call boundaries
- allowing generic APIs to accept specific layouts
The union makes it explicit:
“These types are ABI-compatible overlays of the same memory.”
union ControlRegister {
uint32_t raw;
struct {
uint32_t enable : 1;
uint32_t mode : 3;
uint32_t irq : 1;
uint32_t : 27;
};
};Here:
rawis how hardware sees the register- the bitfield view is how humans reason about it
This is precise representation control, not polymorphism.
- implementation-defined bitfield layout
- fixed compiler + target assumptions
That’s the deal you make.
reinterpret_cast<T*>:
- lies to the type system
- violates aliasing rules
- does not create an object
- often breaks under optimization
A union:
- allocates storage correctly
- guarantees alignment
- makes aliasing explicit
- is recognized by compilers as a representation boundary
This is why low-level code prefers unions over casts.
This is the most language-design–oriented role of unions.
A sum type (aka tagged union, variant type, algebraic data type) means:
A value is exactly one of several alternatives, each with its own type.
Examples:
- JSON values (
null | bool | number | string | array | object) - AST nodes in a compiler
- network messages
- error-or-value results
typedef enum {
VAL_INT,
VAL_FLOAT,
VAL_STRING
} ValueKind;
typedef struct {
ValueKind kind;
union {
int i;
float f;
char *s;
};
} Value;This is:
- explicit
- zero-overhead
- fully deterministic
- ABI-stable
And:
- the union is not about saving memory
- the tag enforces meaning
- the union enforces exclusivity
This is the original sum-type pattern in C.
In C++, the same idea exists, but with more safety:
std::variant<int, float, std::string>Internally:
- it uses union-like storage
- tracks the active alternative
- manages lifetimes
- enforces access rules
So why would anyone still use raw unions?
Because:
std::variantis not ABI-stable- it may add padding, tags, helpers
- it may not match C layouts
- it may not be usable in freestanding / kernel / embedded environments
- it may not be constexpr-friendly enough for some use cases
Thus raw unions remain essential for manual sum types with strict control.
A struct says:
“These things all exist together.”
A union says:
“Exactly one of these exists, and I will tell you which.”
That semantic idea—exclusive alternatives—is independent of memory saving.
Memory saving is just the physical manifestation.
Let’s restate the sentence with precision:
Memory saving is incidental; unions exist to let programmers explicitly define mutually exclusive interpretations of the same storage, enabling representation-level control, ABI-compatible overlays, and low-level implementations of sum types.
Each part reinforces the others:
| Concept | What unions give you |
|---|---|
| Representation control | Exact byte/bit interpretation |
| ABI overlays | Multiple standardized views of same memory |
| Sum types | Explicit “one-of-many” semantics |
| Memory saving | Consequence, not goal |
Given your background (systems C++, networking, sockets, ABI-aware code):
- You already deal with protocol layouts
- You already rely on stable binary representations
- You already need precise lifetime control
- You already care about optimization correctness
Understanding unions as representation tools rather than memory hacks is critical to:
- writing portable low-level code
- avoiding aliasing UB
- designing clean tagged abstractions
- knowing when not to use them
Use unions when you want to control how memory is interpreted. Use structs when you want to model composition. Use
std::variantwhen you want sum types with safety. Usestd::bit_castwhen you want representation conversion.
If you want, next we can:
- dissect a real-world protocol struct
- redesign a union-based API into a
variant-based one - analyze compiler output for union vs
memcpy - or map this directly to network headers, sockets, or ABI-stable libraries
Just tell me where you want to go deeper.
