Nanite Virtualized Geometry
https://docs.unrealengine.com/5.0/en-US/nanite-virtualized-geometry-in-unreal-engine/
- Multiple orders of magnitude increase in geometry complexity, higher triangle and objects counts than has been possible before in real-time
- Frame budgets are no longer constrained by polycounts, draw calls, and mesh memory usage
- Now possible to directly import film-quality source arts, such as ZBrush sculpts and photogrammetry scans
- Use high-poly detailing rather than baking detail into normal map textures
- Level of Detail (LOD) is automatically handled and no longer requires manual setup for individual mesh's LODs
- Loss of quality is rare or non-existent, especially with LOD transitions
- A Nanite mesh is still essentially a triangle mesh at its core with a lot of level of detail and compression applied to its data. On top of that, Nanite uses an entirely new system for rendering that data format
- Nanite meshes support multiple UVs and vertex colors. Materials are assigned to sections of the mesh such that those materials can use different shading models and dynamic effects which can be done in the shaders. Material assignment can be swapped dynamically, just like any other Static Mesh, and Nanite doesn't require any process to bake down materials.
- Virtual Textures are not required to be used with Nanite, but they are highly recommended. Virtual Textures are an orthogonal Unreal Engine feature with similar goals for texture data that Nanite achieves with mesh data.
”Voxels, displacement, point rendering are inferior to triangles. Triangles are the core of nanite.“
- Virtualize geometry like we did textures
- No more budgets
- Polycount
- Draw calls
- Memory
- Directly use film quality source art
- No manual optimization required
- No loss in quality
- MUCH harder than virtual texturing
- Not just memory management
- Geometry detail directly impacts rendering cost
- Geometry isn’t trivally filterable
- Voxels and implicit surfaces have a lot of potential advantages and are the most discussed direction to solve this problem.
- Voxelization is a form of uniform resampling and Uniform resampling means loss. This is the 3D equivalent of converting vector graphics into pixel graphics.
- Support importing meshes authored anywhere
- Still have UVs and tiling detail maps
- Only replacing meshes, not textures, not materials, not tools.
- Voxels with UVs?—what to do about UV seams?
problem - The chief issue with voxels is a data size problem. Maximum sparsity is needed to keep data size small but we can’t sacrifice ray casting performance in the process. The data structure needs to be super adaptive to get sharp edges but not waste samples where its smooth.
problem—Features which vanish as signed distance fields when they are thinner than a voxel Or leak attributes from one side to the other when they are less than a few voxels.
- Subdivision by definition is amplification only.
- Capture displacement like we do normal maps.
- Projecting to normal or displacement maps is a form of uniform resampling.
- Points are super fast to blast to the screen. Points require hole filling.
- How do we know the difference between a small gap that should be there and a hole that should be filled?
- It's impossible to know for certain without extra connectivity data. Also known as the index buffer in a triangle mesh.
- Foundation of computer graphics for good reason.
- No higher quality or faster solution than triangles.
- Triangles are the core of Nanite.
- Rasterization writes visibility buffer
- Depth : VisibleClusterID : TriangleID
- VisBuffer decode preamble for material pixel shader:
- Load VisBuffer
- Load VisBufferCluster => InstanceID, ClusterID
- Load instance transform
- Load 3 vert indexes
- Load 3 positions
- Transform positions to screen
- Derive barycentric coordinates for pixel
- Load and lerp attributes
- Render now retained mode
- GPU scene representation persists across frames
- Sparsely updated where things change
- All vertex/index data in single large resource
- Per view:
- GPU instance cull
- Triangle rasterization
- If only drawing depth the entire scene can draw with 1 Drawdirect
- Group triangles into clusters
- Each cluster has a bounding box
- Cull clusters based on bounds
- Frustum cull
- Occlusion cull
- Occlusion cull against Hierarchical Z-buffer (HZB)
- Calculate screen rect from bounds
- Test against lowest mip where screen rect <= 4x4 pixels
- Two pass solution
- Draw what was visible in the previous frame
- Build HZB
- Draw what is visible now but wasn't in the last frame
- Eliminate:
- Switching shaders during rasterization
- Overdraw for material eval
- Depth pre pass to avoid overdraw
- Pixel quad inefficiencies from dense meshes
- Options:
- REYES
- Texture space shading
- ==Deferred materials==
- Write geometry data to screen
- Material shader per pixel:
- Load VisBuffer
- Load instance transformation
- Load 3 vert indexes
- Load 3 positions
- Transform positions to screen
- Derive barycentric coordinates for pixel
- Load and lerp attributes
- Not as slow as it seems
- Lots of cache hits
- No overdraw or pixel quad inefficiencies
- Material pass writes GBuffer
- Integrates with rest of deferred shading renderer
- Now we can draw all opaque geometry with 1 draw
- Completely GPU driven
- Not just depth prepass
- Rasterize triangles once per view
- Linear scaling in triangles is not okay.
- Why should we draw more triangles than pixels?
- In terms of clusters we want to draw the same number of clusters every frame regardless of the amount of objects or how dense the objects are.
- Decide LOD on a cluster basis
- Build hierarchy of LODs
- Simplest is tree of clusters
- Parents are the simplified versions of their children
- Find cut of the tree for desired LOD
- View dependent based on perceptual difference
- Entire tree doesn't need to be in memory at once
- Can mark any cut of the trees as leaves and toss the rest
- Request data on demand during rendering
- Like virtual texturing
- If each cluster decides LOD independent from neighbors you get cracks
- Using locked boundaries is a bad solution
- Solution
- Detect during build
- Group clusters
- Force them to make the same LOD decision
- Now free to unlock shared edges and collapse them
- Group clusters where needed
- cluster original triangles
- While NumClusters > 1
- Group clusters to clean their shared boundary
- Merge triangles from group into shared list
- Simplify to 50% the # of triangles
- Split simplified triangle list into clusters (128 tris)
Graph partitioning is used to decide what clusters to group
- Main pass
- Instance culling
- Persistent Hierarchy/Cluster Culling
- Software & Hardware rasterizer
- Build HZB
- Software & Hardware rasterizer
- Persistent Hierarchy/Cluster Culling
- Instance culling
- Post pass
- Instance culling
- Persistent Hierarchy/Cluster Culling
- Software & Hardware rasterizer
- Material Passes
- Build HZB
- Software & Hardware rasterizer
- Persistent Hierarchy/Cluster Culling
- Instance culling
- Can we hit pixel scale detail with triangles > 1 pixel?
- Depend on how smooth
- In general no
- We need to draw pixel sized triangles
- Terrible for typical rasterizer
- Typical rasterizer:
- Macro tile binning
- Micro tile 4x4
- Output 2x2 pixel quads
- Highly parallel in pixels not triangles
- Can we beat the HW rasterizer in SW?
- yes, 3x faster
- Binning triangles is as much work as just writing the final pixels
- Even a single vector stamp does wasteful tests for small tris
- Basic bounding box is faster
- Serialization at tile level to handle depth and ROP
- Output 2x2 pixel quads
- General purpose
- VS+PS scheduling
- Output formats, ordering, blending
- Clipping
- …
- Optimized for larger triangles covering many pixels
- Run wide over pixels
- We want many triangles with few pixels each
- Run wide over triangles
- Don't have ROP or test depth hardware
- Need Z-buffering
- Can't serialize tiles
- Many tris may be in parallel for single tile or even single pixel
- ==Use 64 bit atomics==
- InterlockedMax
| 30 | 27 | 7 |
|---|---|---|
| Depth | Visible cluster index | Triangle index |
- Visibility buffer shows its true power