Skip to content

Instantly share code, notes, and snippets.

@ekzhang
Created September 10, 2024 17:27
Show Gist options
  • Save ekzhang/8be0b67edec00048903d49e59270b474 to your computer and use it in GitHub Desktop.
Save ekzhang/8be0b67edec00048903d49e59270b474 to your computer and use it in GitHub Desktop.
Notes on Moonray
  • MoonRay — [[September 8th, 2024]]
    • AOV system is interesting. Light path expressions primarily.
      • "Material AOV" provides different kinds of syntax for debugging materials.
    • Use of different JIT compilation through LLVM to implement their ISPC framework. SPMD computation on an Arras cluster.
    • Vectorized path tracing, i.e., uses all SIMD lanes.
    • Relies on several queues for radiance (dedupe samples), rays, and shading. Each queue is processed by a handler when it reaches its maximum size, for efficiency. Shading queue converts AoS -> SoA and then evaluates every shader & texture with SIMD.
      • Custom image loader with OpenImageIO (OIIO)
      • Integrator spawns rays with Monte Carlo which are incoherent, get placed in a separate queue for sorting.
      • Occlusion rays to lights are also generated and handled asynchronously.
    • "Vectorized Production Path Tracing" (HPG '17) has more context on this.
      • Embree for ray intersections, two uni-directional path tracing implementations based on depth-search (scalar) and wavefront breadth-first (SIMD / ISPC).
      • "do the potential performance benefits gained outweigh the extra work required to harness the vector hardware?"
      • Integrator / secondary rays spawner implements multiple importance sampling, Russian roulette, path splitting.
      • "Another key tenet of DOD is where there is one, there is usually more than one, or more plainly, we should work in batches where possible."
      • Intel TBB is how tasks are spawned.
      • Sorting by [light-set index, UDIM tile, mip level, uv coordinates] in a 32-bit integer.
        • Up to 128 light sets.
      • Then every shader node is vectorized, since they are JIT compiled into an ISPC program and executed on the processor on batches. Texture sampling is done with OIIO loading / caching, and they do point sampling between two adjacent MIP levels for "instant" lookup.
      • Thread-local primarily to avoid contention. The shade queue is not thread-local, for memory reasons. Queues kind of imply BFS because they are limited in size.
    • Vectorized path tracing is about decomposing the system into smaller parts based on their data / memory access patterns. Like a little factory with different machines sending messages.
      • Ray queue is only concerned about global geometry, nothing else.
      • Shading queue cares about materials, light, sampling … but only at one single point.
      • Radiance queue is just environmental illumination.
      • Occlusion queue only works on scene lights and importance sampling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment