What the different layout rules are solving is mapping complex (relative to scalars i.e. u32, f32) data structures to memory (a byte array); each with their own space/time tradeoffs.
Data accessed from memory requires knowledge of a byte offset (relative to the start of the memory).
The most important properties of a data structure are alignment and size.
The alignment is the divisor of any byte offset at which the given data structure can reside (i.e. offset % alignment = 0).
Alignment is a power of 2 and for performance reasons is often more than 1 (1 usually also referred to as unaligned access) due to how CPUs/GPUs data accesses are performed at a hardware level.
The SS
constant denotes the inherent size of the (inner) scalar.
The roundUp
function (returns n rounded up to a multiple of k) is defined for positive integers k and n as:
- roundUp(k, n) = ⌈n ÷ k⌉ × k
The po2
function (returns n rounded up to a power of 2) is defined for positive integer n as:
- po2(n) = 2⌈log2(n)⌉
ty | scalar align | scalar size | std430 align | std430 size | std140 align | std140 size |
---|---|---|---|---|---|---|
scalar S | SS | SS | SS | SS | SS | SS |
vecN<S> | SS | SS * N | po2(SS * N) | SS * N | po2(SS * N) | SS * N |
matCxR<S> | SS | SS * C * R | po2(SS * R) | alignOf(self) * C | roundUp(16, SS * R) | alignOf(self) * C |
array<E, N> | alignOf(E) | sizeOf(E) * N | alignOf(E) | roundUp(alignOf(E), sizeOf(E)) * N | roundUp(16, alignOf(E)) | roundUp(alignOf(self), sizeOf(E)) * N |
struct with members M1...MN | max(alignOf(M1)...alignOf(MN)) | roundUp(alignOf(self), offsetOf(MN) + sizeOf(MN)) | max(alignOf(M1)...alignOf(MN)) | roundUp(alignOf(self), offsetOf(MN) + sizeOf(MN)) | max(16, alignOf(M1)...alignOf(MN)) | roundUp(alignOf(self), offsetOf(MN) + sizeOf(MN)) |
only relevant for laying out vectors inside structs
Same std140/std430 layout rules as above with the only change being that vectors now have scalar alignment (i.e. vecN alignment = S) as long as the rules below are met
Pseudocode
// start offset
F = S * k
if sizeOf(vecN) < 16 {
// start and end offsets need to lay in the same 16 byte block
L = F + sizeOf(vecN)
assert(floor(F / 16) == floor(L / 16))
} else {
// start offset needs to be aligned to 16 bytes
assert(F % 16 == 0)
}
Elements of arrays are laid out according to the following algorithm
Pseudocode
// Note: Array alignment differs between layouts but is always a multiple of the element layout
// Stride is the aligned size of an element
stride = roundUp(alignOf(array), sizeOf(E))
for i in array.length() {
// Offset at which the element resides
array[i].offset = stride * i
}
// This is the return value of sizeOf(array)
array.size = stride * array.length()
Members of structs are laid out according to the following algorithm
Pseudocode
// This is the return value of alignOf(struct)
struct.alignment = max(struct.members.map(alignOf))
// Byte offset from the start of the struct
current_offset = 0
for member in struct.members {
// Align offset for member
current_offset = roundUp(alignOf(member), current_offset)
// Offset at which the member resides
// This is the return value of offsetOf(member)
struct[member].offset = current_offset
current_offset += sizeOf(member)
}
// This is the return value of sizeOf(struct)
struct.size = roundUp(alignOf(struct), current_offset)
The default layout is std430. The extra requirements for the uniform address space have to be explicitly met.
- std430
- std140; with the caveat that matrices of the form
matCx2
have an alignment of 8 instead of 16 and therefore also size C * 8 instead of C * 16
- matrices are column-major
align
andsize
attributes can be used to change the alignment and size of struct members
- std430
- std140
SSBOs require OpenGL 4.3 / OpenGL 4.0 + ARB_shader_storage_buffer_object
- std140
- matrices are column-major (can be overriden to be row-major in buffers via
row_major
layout qualifier; added in GLSL 1.4) offset
andalign
layout qualifiers can be used to change the offset and alignment of struct members (added in GLSL 4.4 / GLSL 1.4 +ARB_enhanced_layouts
)
4.1. StorageBuffer Storage Class / PushConstant Storage Class / Uniform Storage Class with BufferBlock Decoration
- std140
- std430; default
- scalar; via
scalarBlockLayout
in Vulkan v1.2 orVK_EXT_scalar_block_layout
- vector-relaxed std140 / std430; since Vulkan v1.1 or via
VK_KHR_relaxed_block_layout
- std140; default
- std430; via
uniformBufferStandardLayout
in Vulkan v1.2 orVK_KHR_uniform_buffer_standard_layout
- scalar; via
scalarBlockLayout
in Vulkan v1.2 orVK_EXT_scalar_block_layout
- vector-relaxed std140 / std430; since Vulkan v1.1 or via
VK_KHR_relaxed_block_layout
Offset
decoration is required on struct membersArrayStride
decoration is required on array typesMatrixStride
and eitherColMajor
orRowMajor
decorations are required for matrices-
Even if scalar alignment is supported, it is generally more performant to use the base alignment.
Vulkan Shader Memory Layout Guide
SPIR-V Specification (Decorations)
SPIR-V Specification (Shader Validation)
- scalar
- vector-relaxed std140; with the caveat that struct members of type matrix, array or struct don't round up their size to a multiple of their alignment
- scalar; via
-no-legacy-cbuf-layout
DXC flag
- matrices are column-major in buffers by default (can be overriden via
row_major
modifier), however are row-major in shaders (notation (i.e.float4x3
is a 3 column 4 row matrix), construction and access are all row-major)
HLSL Constant Buffer Packing Rules
DXC HLSL to SPIR-V Feature Mapping
- std430; with the caveat that vector 3's size is 16 instead of 12 (however a packed vector 3 with the alignas specifier = 16 can be used instead)
- provides extra packed vectors (scalar layout)
- matrices are column-major
alignas
specifier can be used to change the alignment (can be applied to structs or struct members)
I see what you mean, I think in practice the only difference is that bound buffers can be slightly smaller in cases where the offset of the last element + its size is not a multiple of the struct's alignment.
It seems GL, Vulkan, Metal and D3D12 don't require bound ranges of buffers to be at least as big as the struct declaration in the shader, only WebGPU has this requirement and I guess that's why all the native APIs only talk about alignment and offsets.
I will see how to update the the md to best reflect this.