Ristovski · March 5, 2026 12:50
diff --git a/vkperf_NV_AD103 b/vkperf_NV_AD103
 vkperf (0.99.5) tests various performance characteristics of Vulkan devices.

 Devices in the system:
   AMD Radeon Graphics (RADV RENOIR)
   NVIDIA GeForce RTX 4070 Ti SUPER
   llvmpipe (LLVM 19.1.7, 256 bits)

 Selected device:
   NVIDIA GeForce RTX 4070 Ti SUPER

 VendorID:  0x10de (Nvidia)
 DeviceID:  0x2705
 Vulkan version:  1.4.303
 Driver version:  570.133.7.0 (2392932800, 0x8ea141c0)
   Driver name:  NVIDIA
   Driver info:  570.133.07
   DriverID:     NvidiaProprietary
   Driver conformance version:  1.4.1.0
 GPU memory:  16GiB  (16376MiB)
 Max memory allocations:  4294967295
 Standard (non-sparse) buffer alignment:  16
 Number of triangles for tests:  1000000
 Sparse mode for tests:  None
 Timestamp number of bits:  64
 Timestamp period:  1ns
 Vulkan Instance version:  1.4.328
 Operating system:  < unknown, non-Windows >
 Processor:  AMD Ryzen 7 5700G with Radeon Graphics

 Triangle throughput:
   Triangle list (triangle list primitive type,
      single per-scene vkCmdDraw() call, attributeless,
      constant VS output):                     10.38 giga-triangles/s
   Indexed triangle list (triangle list primitive type, single
      per-scene vkCmdDrawIndexed() call, no vertices shared between triangles,
      attributeless, constant VS output):      10.38 giga-triangles/s
   Indexed triangle list that reuses two indices of the previous triangle
      (triangle list primitive type, single per-scene vkCmdDrawIndexed() call,
      attributeless, constant VS output):      20.34 giga-triangles/s
   Triangle strips of various lengths
      (per-strip vkCmdDraw() call, 1 to 1000 triangles per strip,
      attributeless, constant VS output):      
         strip length 1:    302.1 mega-triangles/s
         strip length 2:    606.9 mega-triangles/s
         strip length 5:    1.521 giga-triangles/s
         strip length 8:    2.435 giga-triangles/s
         strip length 10:   3.042 giga-triangles/s
         strip length 20:   6.103 giga-triangles/s
         strip length 25:   7.629 giga-triangles/s
         strip length 40:   12.36 giga-triangles/s
         strip length 50:   15.50 giga-triangles/s
         strip length 100:  30.51 giga-triangles/s
         strip length 125:  26.39 giga-triangles/s
         strip length 1000: 28.72 giga-triangles/s
   Indexed triangle strips of various lengths
      (per-strip vkCmdDrawIndexed() call, 1-1000 triangles per strip,
      no vertices shared between strips, each index used just once,
      attributeless, constant VS output):      
         strip length 1:    277.1 mega-triangles/s
         strip length 2:    555.4 mega-triangles/s
         strip length 5:    1.391 giga-triangles/s
         strip length 8:    2.229 giga-triangles/s
         strip length 10:   2.790 giga-triangles/s
         strip length 20:   5.580 giga-triangles/s
         strip length 25:   7.025 giga-triangles/s
         strip length 40:   11.22 giga-triangles/s
         strip length 50:   14.15 giga-triangles/s
         strip length 100:  28.72 giga-triangles/s
         strip length 125:  31.50 giga-triangles/s
         strip length 1000: 28.72 giga-triangles/s
   Primitive restart indexed triangle strips of various lengths
      (single per-scene vkCmdDrawIndexed() call, 1-1000 triangles per strip,
      no vertices shared between strips, each index used just once,
      attributeless, constant VS output):      
         strip length 1:    1.903 giga-triangles/s
         strip length 2:    3.685 giga-triangles/s
         strip length 5:    8.346 giga-triangles/s
         strip length 8:    12.20 giga-triangles/s
         strip length 1000: 27.90 giga-triangles/s
   Primitive restart, each triangle is replaced by one -1
      (single per-scene vkCmdDrawIndexed() call,
      no fragments produced):                  2.077 giga-triangles/s
   Primitive restart, only zeros in the index buffer
      (single per-scene vkCmdDrawIndexed() call,
      no fragments produced):                  30.51 giga-triangles/s
   Instancing throughput of vkCmdDraw()
      (one triangle per instance, constant VS output, one draw call,
      attributeless):                          2.146 giga-triangles/s
   Instancing throughput of vkCmdDrawIndexed()
      (one triangle per instance, constant VS output, one draw call,
      attributeless):                          2.077 giga-triangles/s
   Instancing throughput of vkCmdDrawIndirect()
      (one triangle per instance, one indirect draw call,
      one indirect record, attributeless:      2.141 giga-triangles/s
   Instancing throughput of vkCmdDrawIndexedIndirect()
      (one triangle per instance, one indirect draw call,
      one indirect record, attributeless:      2.077 giga-triangles/s
   vkCmdDraw() throughput
      (per-triangle vkCmdDraw() in command buffer,
      attributeless, constant VS output):      302.8 mega-triangles/s
   vkCmdDrawIndexed() throughput
      (per-triangle vkCmdDrawIndexed() in command buffer,
      attributeless, constant VS output):      276.9 mega-triangles/s
   VkDrawIndirectCommand processing throughput
      (per-triangle VkDrawIndirectCommand, one vkCmdDrawIndirect() call,
      attributeless):                          187.8 mega-indirectRecords/s
   VkDrawIndirectCommand processing throughput with stride 32
      (per-triangle VkDrawIndirectCommand, one vkCmdDrawIndirect() call,
      attributeless):                          120.5 mega-indirectRecords/s
   VkDrawIndexedIndirectCommand processing throughput
      (per-triangle VkDrawIndexedIndirectCommand,
      1x vkCmdDrawIndexedIndirect() call,
      attributeless):                          143.6 mega-indirectRecords/s
   VkDrawIndexedIndirectCommand processing throughput with stride 32
      (per-triangle VkDrawIndexedIndirectCommand,
      1x vkCmdDrawIndexedIndirect() call,
      attributeless):                          117.6 mega-indirectRecords/s

 Vertex and geometry shader throughput:
   VS throughput using vkCmdDraw() - minimal VS that just writes
      constant output position (per-scene vkCmdDraw() call,
      no attributes, no fragments produced):   31.16 giga-vertices/s
   VS throughput using vkCmdDrawIndexed() - minimal VS that just writes
      constant output position (per-scene vkCmdDrawIndexed() call,
      no attributes, no fragments produced):   31.16 giga-vertices/s
   VS producing output position from VertexIndex and InstanceIndex
      using vkCmdDraw() (single per-scene vkCmdDraw() call,
      attributeless, no fragments produced):   31.16 giga-vertices/s
   VS producing output position from VertexIndex and InstanceIndex
      using vkCmdDrawIndexed() (single per-scene vkCmdDrawIndexed() call,
      attributeless, no fragments produced):   31.16 giga-vertices/s
   GS one triangle in and no triangle out
      (empty VS, attributeless):               3.577 giga-invocations/s
   GS one triangle in and single constant triangle out
      (empty VS, attributeless):               3.577 giga-invocations/s
   GS one triangle in and two constant triangles out
      (empty VS, attributeless):               3.577 giga-invocations/s

 Attributes and buffers:
   One attribute performance - 1x vec4 attribute
      (attribute used, per-scene draw call):   30.83 giga-vertices/s
   One buffer performance - 1x vec4 buffer
      (1x read in VS, per-scene draw call):    30.83 giga-vertices/s
   One buffer performance - 1x vec3 buffer
      (1x read in VS, one draw call):          31.16 giga-vertices/s
   Two attributes performance - 2x vec4 attribute
      (both attributes used):                  19.92 giga-vertices/s
   Two buffers performance - 2x vec4 buffer
      (both buffers read in VS):               19.66 giga-vertices/s
   Two buffers performance - 2x vec3 buffer
      (both buffers read in VS):               25.92 giga-vertices/s
   Two interleaved attributes performance - 2x vec4
      (2x vec4 attribute fetched from the single buffer in VS
      from consecutive buffer locations:       19.79 giga-vertices/s
   Two interleaved buffers performance - 2x vec4
      (2x vec4 fetched from the single buffer in VS
      from consecutive buffer locations:       20.92 giga-vertices/s
   Packed buffer performance - 1x buffer using 32-byte struct unpacked
      into position+normal+color+texCoord:     20.06 giga-vertices/s
   Packed attribute performance - 2x uvec4 attribute unpacked
      into position+normal+color+texCoord:     19.92 giga-vertices/s
   Packed buffer performance - 2x uvec4 buffers unpacked
      into position+normal+color+texCoord:     19.66 giga-vertices/s
   Packed buffer performance - 2x buffer using 16-byte struct unpacked
      into position+normal+color+texCoord:     19.66 giga-vertices/s
   Packed buffer performance - 2x buffer using 16-byte struct
      read multiple times and unpacked
      into position+normal+color+texCoord:     19.66 giga-vertices/s
   Four attributes performance - 4x vec4 attribute
      (all attributes used):                   10.10 giga-vertices/s
   Four buffers performance - 4x vec4 buffer
      (all buffers read in VS):                10.53 giga-vertices/s
   Four buffers performance - 4x vec3 buffer
      (all buffers read in VS):                13.81 giga-vertices/s
   Four interleaved attributes performance - 4x vec4
      (4x vec4 fetched from the single buffer
      on consecutive locations:                10.10 giga-vertices/s
   Four interleaved buffers performance - 4x vec4
      (4x vec4 fetched from the single buffer
      on consecutive locations:                10.61 giga-vertices/s
   Four attributes performance - 2x vec4 and 2x uint attribute
      (2x vec4f32 + 2x vec4u8, 2x conversion from vec4u8
      to vec4):                                15.58 giga-vertices/s

 Transformations:
   Matrix performance - one matrix as uniform for all triangles
      (maxtrix read in VS,
      coordinates in vec4 attribute):          30.83 giga-vertices/s
   Matrix performance - per-triangle matrix in buffer
      (different matrix read for each triangle in VS,
      coordinates in vec4 attribute):          17.13 giga-vertices/s
   Matrix performance - per-triangle matrix in attribute
      (triangles are instanced and each triangle receives a different matrix,
      coordinates in vec4 attribute:           5.847 giga-vertices/s
   Matrix performance - one matrix in buffer for all triangles and 2x uvec4
      packed attributes (each triangle reads matrix from the same place in
      the buffer, attributes unpacked):        19.92 giga-vertices/s
   Matrix performance - per-triangle matrix in the buffer and 2x uvec4 packed
      attributes (each triangle reads a different matrix from a buffer,
      attributes unpacked):                    12.15 giga-vertices/s
   Matrix performance - per-triangle matrix in buffer and 2x uvec4 packed
      buffers (each triangle reads a different matrix from a buffer,
      packed buffers unpacked):                12.68 giga-vertices/s
   Matrix performance - GS reads per-triangle matrix from buffer and 2x uvec4
      packed buffers (each triangle reads a different matrix from a buffer,
      packed buffers unpacked in GS):          9.212 giga-vertices/s
   Matrix performance - per-triangle matrix in buffer and four attributes
      (each triangle reads a different matrix from a buffer,
      4x vec4 attribute):                      7.609 giga-vertices/s
   Matrix performance - 1x per-triangle matrix in buffer, 2x uniform matrix and
      and 2x uvec4 packed attributes (uniform view and projection matrices
      multiplied with per-triangle model matrix and with unpacked attributes of
      position, normal, color and texCoord:    12.15 giga-vertices/s
   Matrix performance - 2x per-triangle matrix (mat4+mat3) in buffer,
      3x uniform matrix (mat4+mat4+mat3) and 2x uvec4 packed attributes
      (full position and normal computation with MVP and normal matrices,
      all matrices and attributes multiplied): 9.668 giga-vertices/s
   Matrix performance - 2x per-triangle matrix (mat4+mat3) in buffer,
      2x non-changing matrix (mat4+mat4) in push constants,
      1x constant matrix (mat3) and 2x uvec4 packed attributes (all
      matrices and attributes multiplied):     9.668 giga-vertices/s
   Matrix performance - 2x per-triangle matrix (mat4+mat3) in buffer, 2x
      non-changing matrix (mat4+mat4) in specialization constants, 1x constant
      matrix (mat3) defined by VS code and 2x uvec4 packed attributes (all
      matrices and attributes multiplied):     9.511 giga-vertices/s
   Matrix performance - 2x per-triangle matrix (mat4+mat3) in buffer,
      3x constant matrix (mat4+mat4+mat3) defined by VS code and
      2x uvec4 packed attributes (all matrices and attributes
      multiplied):                             9.574 giga-vertices/s
   Matrix performance - GS five matrices processing, 2x per-triangle matrix
      (mat4+mat3) in buffer, 3x uniform matrix (mat4+mat4+mat3) and
      2x uvec4 packed attributes passed through VS (all matrices and
      attributes multiplied):                  8.394 giga-vertices/s
   Matrix performance - GS five matrices processing, 2x per-triangle matrix
      (mat4+mat3) in buffer, 3x uniform matrix (mat4+mat4+mat3) and
      2x uvec4 packed data read from buffer in GS (all matrices and attributes
      multiplied):                             7.550 giga-vertices/s
   Textured Phong and Matrix performance - 2x per-triangle matrix
      in buffer (mat4+mat3), 3x uniform matrix (mat4+mat4+mat3) and
      four attributes (vec4f32+vec3f32+vec4u8+vec2f32),
      no fragments produced:                   8.394 giga-vertices/s
   Textured Phong and Matrix performance - 1x per-triangle matrix
      in buffer (mat4), 2x uniform matrix (mat4+mat4) and
      four attributes (vec4f32+vec3f32+vec4u8+vec2f32),
      no fragments produced:                   10.50 giga-vertices/s
   Textured Phong and Matrix performance - 1x per-triangle matrix
      in buffer (mat4), 2x uniform matrix (mat4+mat4) and 2x uvec4 packed
      attribute, no fragments produced:        12.15 giga-vertices/s
   Textured Phong and Matrix performance - 1x per-triangle row-major matrix
      in buffer (mat4), 2x uniform not-row-major matrix (mat4+mat4),
      2x uvec4 packed attributes,
      no fragments produced:                   12.15 giga-vertices/s
   Textured Phong and Matrix performance - 1x per-triangle mat4x3 matrix
      in buffer, 2x uniform matrix (mat4+mat4) and 2x uvec4 packed attributes,
      no fragments produced:                   13.37 giga-vertices/s
   Textured Phong and Matrix performance - 1x per-triangle row-major mat4x3
      matrix in buffer, 2x uniform matrix (mat4+mat4), 2x uvec4 packed
      attribute, no fragments produced:        13.43 giga-vertices/s
   Textured Phong and PAT performance - PAT v1 (Position-Attitude-Transform,
      performing translation (vec3) and rotation (quaternion as vec4) using
      implementation 1), PAT is per-triangle 2x vec4 in buffer,
      2x uniform matrix (mat4+mat4), 2x uvec4 packed attributes,
      no fragments produced:                   15.02 giga-vertices/s
   Textured Phong and PAT performance - PAT v2 (Position-Attitude-Transform,
      performing translation (vec3) and rotation (quaternion as vec4) using
      implementation 2), PAT is per-triangle 2x vec4 in buffer,
      2x uniform matrix (mat4+mat4), 2x uvec4 packed attributes,
      no fragments produced:                   14.94 giga-vertices/s
   Textured Phong and PAT performance - PAT v3 (Position-Attitude-Transform,
      performing translation (vec3) and rotation (quaternion as vec4) using
      implementation 3), PAT is per-triangle 2x vec4 in buffer,
      2x uniform matrix (mat4+mat4), 2x uvec4 packed attributes,
      no fragments produced:                   15.02 giga-vertices/s
   Textured Phong and PAT performance - constant single PAT v2 sourced from
      the same index in buffer (vec3+vec4), 2x uniform matrix (mat4+mat4),
      2x uvec4 packed attributes,
      no fragments produced:                   19.92 giga-vertices/s
   Textured Phong and PAT performance - indexed draw call, per-triangle PAT v2
      in buffer (vec3+vec4), 2x uniform matrix (mat4+mat4), 2x uvec4 packed
      attribute, no fragments produced:        13.75 giga-vertices/s
   Textured Phong and PAT performance - indexed draw call, constant single
      PAT v2 sourced from the same index in buffer (vec3+vec4),
      2x uniform matrix (mat4+mat4), 2x uvec4 packed attributes,
      no fragments produced:                   17.75 giga-vertices/s
   Textured Phong and PAT performance - primitive restart, indexed draw call,
      per-triangle PAT v2 in buffer (vec3+vec4), 2x uniform matrix (mat4+mat4),
      2x uvec4 packed attributes,
      no fragments produced:                   5.710 giga-vertices/s
   Textured Phong and PAT performance - primitive restart, indexed draw call,
      constant single PAT v2 sourced from the same index in buffer (vec3+vec4),
      2x uniform matrix (mat4+mat4), 2x uvec4 packed attributes,
      no fragments produced:                   5.710 giga-vertices/s
   Textured Phong and double precision matrix performance - double precision
      per-triangle matrix in buffer (dmat4), double precision per-scene view
      matrix in uniform (dmat4), both matrices converted to single precision
      before computations, single precision per-scene perspective matrix in
      uniform (mat4), single precision vertex positions, packed attributes
      (2x uvec4), no fragments produced:       8.719 giga-vertices/s
   Textured Phong and double precision matrix performance - double precision
      per-triangle matrix in buffer (dmat4), double precision per-scene view
      matrix in uniform (dmat4), both matrices multiplied in double precision,
      single precision vertex positions, single precision per-scene
      perspective matrix in uniform (mat4), packed attributes (2x uvec4),
      no fragments produced:                   5.203 giga-vertices/s
   Textured Phong and double precision matrix performance - double precision
      per-triangle matrix in buffer (dmat4), double precision per-scene view
      matrix in uniform (dmat4), both matrices multiplied in double precision,
      double precision vertex positions (dvec3), single precision per-scene
      perspective matrix in uniform (mat4), packed attributes (3x uvec4),
      no fragments produced:                   5.415 giga-vertices/s
   Textured Phong and double precision matrix performance using GS - double
      precision per-triangle matrix in buffer (dmat4), double precision
      per-scene view matrix in uniform (dmat4), both matrices multiplied in
      double precision, double precision vertex positions (dvec3), single
      precision per-scene perspective matrix in uniform (mat4), packed
      attributes (3x uvec4),
      no fragments produced:                   2.013 giga-vertices/s

 Fragment throughput:
   Single full-framebuffer quad,
      constant color FS:                       135.0 giga-fragments/s
   10x full-framebuffer quad,
      constant color FS:                       202.5 giga-fragments/s
   Four smooth interpolators (4x vec4),
      10x fullscreen quad:                     164.6 giga-fragments/s
   Four flat interpolators (4x vec4),
      10x fullscreen quad:                     174.5 giga-fragments/s
   Four textured phong interpolators (vec3+vec3+vec4+vec2),
      10x fullscreen quad:                     200.4 giga-fragments/s
   Textured Phong, packed uniforms (four smooth interpolators
      (vec3+vec3+vec4+vec2), 4x uniform (material (56 byte) +
      globalAmbientLight (12 byte) + light (64 byte) + sampler2D),
      10x fullscreen quad):                    120.5 giga-fragments/s
   Textured Phong, not packed uniforms (four smooth interpolators
      (vec3+vec3+vec4+vec2), 4x uniform (material (72 byte) +
      globalAmbientLight (12 byte) + light (80 byte) + sampler2D),
      10x fullscreen quad):                    120.5 giga-fragments/s
   Simplified Phong, no texture, no specular (2x smooth interpolator
      (vec3+vec3), 3x uniform (material (vec4+vec4) + globalAmbientLight
      (vec3) + light (48 bytes: position+attenuation+ambient+diffuse)),
      10x fullscreen quad):                    198.5 giga-fragments/s
   Simplified Phong, no texture, no specular, single uniform
      (2x smooth interpolator (vec3+vec3), 1x uniform
      (material+globalAmbientLight+light (vec4+vec4+vec4 + 3x vec4),
      10x fullscreen quad):                    196.6 giga-fragments/s
   Constant color from uniform, 1x uniform (vec4) in FS,
      10x fullscreen quad:                     202.5 giga-fragments/s
   Constant color from uniform, 1x uniform (uint) in FS,
      10x fullscreen quad:                     202.5 giga-fragments/s

 Transfer throughput:
   Transfer of consecutive blocks:
      4 bytes: 12.4224ns per transfer (0.299885 GiB/s)
      4 bytes: 8.9664ns per transfer (0.415472 GiB/s)
      8 bytes: 11.0016ns per transfer (0.677227 GiB/s)
      16 bytes: 11.1648ns per transfer (1.33466 GiB/s)
      32 bytes: 11.4688ns per transfer (2.59856 GiB/s)
      64 bytes: 11.5456ns per transfer (5.16254 GiB/s)
      128 bytes: 12.4512ns per transfer (9.57412 GiB/s)
      256 bytes: 17.4805ns per transfer (13.6391 GiB/s)
      512 bytes: 28.9609ns per transfer (16.4648 GiB/s)
      1024 bytes: 53.5938ns per transfer (17.7945 GiB/s)
      2048 bytes: 104.031ns per transfer (18.3344 GiB/s)
      4096 bytes: 204.25ns per transfer (18.6766 GiB/s)
      8192 bytes: 405.75ns per transfer (18.8032 GiB/s)
      16384 bytes: 812ns per transfer (18.7916 GiB/s)
      32768 bytes: 1623ns per transfer (18.8032 GiB/s)
      65536 bytes: 3246ns per transfer (18.8032 GiB/s)
      131072 bytes: 6494ns per transfer (18.7974 GiB/s)
      262144 bytes: 12672ns per transfer (19.2661 GiB/s)
      524288 bytes: 2048ns per transfer (238.419 GiB/s)
      1048576 bytes: 3584ns per transfer (272.478 GiB/s)
      2097152 bytes: 6144ns per transfer (317.891 GiB/s)
   Transfer of spaced blocks:
      4 bytes: 8.9632ns per transfer (0.415621 GiB/s)
      4 bytes: 8.9632ns per transfer (0.415621 GiB/s)
      8 bytes: 8.96ns per transfer (0.831538 GiB/s)
      16 bytes: 8.9632ns per transfer (1.66248 GiB/s)
      32 bytes: 8.9696ns per transfer (3.32259 GiB/s)
      64 bytes: 8.9792ns per transfer (6.63808 GiB/s)
      128 bytes: 9.9168ns per transfer (12.0209 GiB/s)
      256 bytes: 16.4648ns per transfer (14.4805 GiB/s)
      512 bytes: 25.875ns per transfer (18.4285 GiB/s)
      1024 bytes: 46.5156ns per transfer (20.5022 GiB/s)
      2048 bytes: 90.0938ns per transfer (21.1707 GiB/s)
      4096 bytes: 180.375ns per transfer (21.1487 GiB/s)
      8192 bytes: 346.375ns per transfer (22.0264 GiB/s)
      16384 bytes: 671.25ns per transfer (22.7319 GiB/s)
      32768 bytes: 1381.5ns per transfer (22.0902 GiB/s)
      65536 bytes: 2910ns per transfer (20.9743 GiB/s)
      131072 bytes: 5838ns per transfer (20.9096 GiB/s)
      262144 bytes: 10752ns per transfer (22.7065 GiB/s)
      524288 bytes: 13312ns per transfer (36.6798 GiB/s)
      1048576 bytes: 17440ns per transfer (55.9956 GiB/s)
      2097152 bytes: 20480ns per transfer (95.3674 GiB/s)

 Measurement statistics:
   Triangle throughput measurement time:  10.5 seconds using 413 test rounds.
   Vertex throughput measurement time:    0.505 seconds using 413 test rounds.
   Attribute and Buffer measurement time: 1.37 seconds using 413 test rounds.
   Transformation measurement time:       4.6 seconds using 413 test rounds.
   Fragment throughput measurement time:  0.504 seconds using 413 test rounds.
   Transfer throughput measurement time:  1.58 seconds using 413 test rounds.
   Total device time: 18.5 seconds.
   Total real time:   20 seconds.
No results found