Skip to content

Instantly share code, notes, and snippets.

@edecoux
Last active September 15, 2024 17:58
Show Gist options
  • Select an option

  • Save edecoux/8a44614f135104f20aa0babafbcdcf5d to your computer and use it in GitHub Desktop.

Select an option

Save edecoux/8a44614f135104f20aa0babafbcdcf5d to your computer and use it in GitHub Desktop.
Nanite virtualized geometry.md

Nanite Virtualized Geometry

https://docs.unrealengine.com/5.0/en-US/nanite-virtualized-geometry-in-unreal-engine/

  • Multiple orders of magnitude increase in geometry complexity, higher triangle and objects counts than has been possible before in real-time
  • Frame budgets are no longer constrained by polycounts, draw calls, and mesh memory usage
  • Now possible to directly import film-quality source arts, such as ZBrush sculpts and photogrammetry scans
  • Use high-poly detailing rather than baking detail into normal map textures
  • Level of Detail (LOD) is automatically handled and no longer requires manual setup for individual mesh's LODs
  • Loss of quality is rare or non-existent, especially with LOD transitions
  • A Nanite mesh is still essentially a triangle mesh at its core with a lot of level of detail and compression applied to its data. On top of that, Nanite uses an entirely new system for rendering that data format
  • Nanite meshes support multiple UVs and vertex colors. Materials are assigned to sections of the mesh such that those materials can use different shading models and dynamic effects which can be done in the shaders. Material assignment can be swapped dynamically, just like any other Static Mesh, and Nanite doesn't require any process to bake down materials.
  • Virtual Textures are not required to be used with Nanite, but they are highly recommended. Virtual Textures are an orthogonal Unreal Engine feature with similar goals for texture data that Nanite achieves with mesh data.

”Voxels, displacement, point rendering are inferior to triangles. Triangles are the core of nanite.“

Dream pipeline

  • Virtualize geometry like we did textures
  • No more budgets
    • Polycount
    • Draw calls
    • Memory
  • Directly use film quality source art
    • No manual optimization required
  • No loss in quality

Reality

  • MUCH harder than virtual texturing
    • Not just memory management
    • Geometry detail directly impacts rendering cost
    • Geometry isn’t trivally filterable

Options

Voxels

  • Voxels and implicit surfaces have a lot of potential advantages and are the most discussed direction to solve this problem.
  • Voxelization is a form of uniform resampling and Uniform resampling means loss. This is the 3D equivalent of converting vector graphics into pixel graphics.
    • Support importing meshes authored anywhere
    • Still have UVs and tiling detail maps
    • Only replacing meshes, not textures, not materials, not tools.
      • Voxels with UVs?—what to do about UV seams?

problem - The chief issue with voxels is a data size problem. Maximum sparsity is needed to keep data size small but we can’t sacrifice ray casting performance in the process. The data structure needs to be super adaptive to get sharp edges but not waste samples where its smooth.

problem—Features which vanish as signed distance fields when they are thinner than a voxel Or leak attributes from one side to the other when they are less than a few voxels.

Subdivision surfaces

  • Subdivision by definition is amplification only.

Displacement maps

  • Capture displacement like we do normal maps.
  • Projecting to normal or displacement maps is a form of uniform resampling.

Points

  • Points are super fast to blast to the screen. Points require hole filling.
  • How do we know the difference between a small gap that should be there and a hole that should be filled?
  • It's impossible to know for certain without extra connectivity data. Also known as the index buffer in a triangle mesh.

Triangles

  • Foundation of computer graphics for good reason.
  • No higher quality or faster solution than triangles.
  • Triangles are the core of Nanite.

Visibility Buffer

  • Rasterization writes visibility buffer
    • Depth : VisibleClusterID : TriangleID
  • VisBuffer decode preamble for material pixel shader:
    • Load VisBuffer
    • Load VisBufferCluster => InstanceID, ClusterID
    • Load instance transform
    • Load 3 vert indexes
    • Load 3 positions
    • Transform positions to screen
    • Derive barycentric coordinates for pixel
    • Load and lerp attributes

GPU Driven pipeline

  • Render now retained mode
    • GPU scene representation persists across frames
    • Sparsely updated where things change
    • All vertex/index data in single large resource
  • Per view:
    • GPU instance cull
    • Triangle rasterization
  • If only drawing depth the entire scene can draw with 1 Drawdirect

Triangle cluster culling

  • Group triangles into clusters
    • Each cluster has a bounding box
  • Cull clusters based on bounds
    • Frustum cull
    • Occlusion cull

Occlusion culling

  • Occlusion cull against Hierarchical Z-buffer (HZB)
  • Calculate screen rect from bounds
  • Test against lowest mip where screen rect <= 4x4 pixels

Two pass occlusion culling

  • Two pass solution
    • Draw what was visible in the previous frame
    • Build HZB
    • Draw what is visible now but wasn't in the last frame

Decouple visibility from material

  • Eliminate:
    • Switching shaders during rasterization
    • Overdraw for material eval
    • Depth pre pass to avoid overdraw
    • Pixel quad inefficiencies from dense meshes
  • Options:
    • REYES
    • Texture space shading
    • ==Deferred materials==

Visibility Buffer

  • Write geometry data to screen
  • Material shader per pixel:
    • Load VisBuffer
    • Load instance transformation
    • Load 3 vert indexes
    • Load 3 positions
    • Transform positions to screen
    • Derive barycentric coordinates for pixel
    • Load and lerp attributes
  • Not as slow as it seems
    • Lots of cache hits
    • No overdraw or pixel quad inefficiencies
  • Material pass writes GBuffer
    • Integrates with rest of deferred shading renderer
  • Now we can draw all opaque geometry with 1 draw
    • Completely GPU driven
    • Not just depth prepass
    • Rasterize triangles once per view

Sub-linear scaling

  • Linear scaling in triangles is not okay.
  • Why should we draw more triangles than pixels?
    • In terms of clusters we want to draw the same number of clusters every frame regardless of the amount of objects or how dense the objects are.

Cluster hierarchy

  • Decide LOD on a cluster basis
  • Build hierarchy of LODs
    • Simplest is tree of clusters
    • Parents are the simplified versions of their children

LOD runtime

  • Find cut of the tree for desired LOD
  • View dependent based on perceptual difference

Streaming

  • Entire tree doesn't need to be in memory at once
  • Can mark any cut of the trees as leaves and toss the rest
  • Request data on demand during rendering
    • Like virtual texturing

LOD cracks

  • If each cluster decides LOD independent from neighbors you get cracks
  • Using locked boundaries is a bad solution
  • Solution
    • Detect during build
    • Group clusters
      • Force them to make the same LOD decision
      • Now free to unlock shared edges and collapse them
      • Group clusters where needed

Build operations

  • cluster original triangles
  • While NumClusters > 1
    • Group clusters to clean their shared boundary
    • Merge triangles from group into shared list
    • Simplify to 50% the # of triangles
    • Split simplified triangle list into clusters (128 tris)

DAG

Graph partitioning is used to decide what clusters to group

Culling dataflow

  • Main pass
    • Instance culling
      • Persistent Hierarchy/Cluster Culling
        • Software & Hardware rasterizer
          • Build HZB
  • Post pass
    • Instance culling
      • Persistent Hierarchy/Cluster Culling
        • Software & Hardware rasterizer
          • Material Passes
          • Build HZB

Rasterization

Pixel scale detail

  • Can we hit pixel scale detail with triangles > 1 pixel?
    • Depend on how smooth
    • In general no
  • We need to draw pixel sized triangles

Tiny triangles

  • Terrible for typical rasterizer
  • Typical rasterizer:
    • Macro tile binning
    • Micro tile 4x4
    • Output 2x2 pixel quads
    • Highly parallel in pixels not triangles
  • Can we beat the HW rasterizer in SW?
    • yes, 3x faster

Tiny triangles

  • Binning triangles is as much work as just writing the final pixels
  • Even a single vector stamp does wasteful tests for small tris
    • Basic bounding box is faster
  • Serialization at tile level to handle depth and ROP
  • Output 2x2 pixel quads
  • General purpose
    • VS+PS scheduling
    • Output formats, ordering, blending
    • Clipping

Tiny triangles

  • Optimized for larger triangles covering many pixels
    • Run wide over pixels
  • We want many triangles with few pixels each
    • Run wide over triangles

How to depth test?

  • Don't have ROP or test depth hardware
  • Need Z-buffering
    • Can't serialize tiles
    • Many tris may be in parallel for single tile or even single pixel
  • ==Use 64 bit atomics==
  • InterlockedMax
30 27 7
Depth Visible cluster index Triangle index
  • Visibility buffer shows its true power
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment