- Bluenoise in the game INSIDE (dithering, raymarching, reflections)
- Dithering, Ray marching, shadows etc
- A Survery of Blue Noise and Its Applications
- Moments In Graphics (void-and-cluster)
- Bart Wronski Implementation of Solid Angle algorithm
| In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader | |
| group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile. | |
| Simplified HLSL code looks like this: | |
| Buffer<float4> lightDatas; | |
| Texture2D<uint2> lightStartCounts; | |
| RWTexture2D<float4> output; | |
| [numthreads(8, 8, 1)] |
| http://www.anandtech.com/show/11170/the-amd-zen-and-ryzen-7-review-a-deep-dive-on-1800x-1700x-and-1700/6 | |
| "The High-Level Zen Overview" | |
| - "Features such as the micro-op cache help most instruction streams improve in performance and bypass parts of potentially | |
| long-cycle repetitive operations, but also the larger dispatch, larger retire, larger schedulers and better branch | |
| prediction means that higher throughput can be maintained longer and in the fastest order possible." | |
| Micro-op caches have nothing to with "bypassing parts of potentially long-cycle repetitive operations" (what does | |
| that even mean?). They reduce decode bottlenecks and decrease power consumption. Depending on the implementation, |
| // The following code is licensed under the MIT license: https://gist.github.com/TheRealMJP/bc503b0b87b643d3505d41eab8b332ae | |
| // Samples a texture with Catmull-Rom filtering, using 9 texture fetches instead of 16. | |
| // See http://vec3.ca/bicubic-filtering-in-fewer-taps/ for more details | |
| float4 SampleTextureCatmullRom(in Texture2D<float4> tex, in SamplerState linearSampler, in float2 uv, in float2 texSize) | |
| { | |
| // We're going to sample a a 4x4 grid of texels surrounding the target UV coordinate. We'll do this by rounding | |
| // down the sample location to get the exact center of our "starting" texel. The starting texel will be at | |
| // location [1, 1] in the grid, where [0, 0] is the top left corner. | |
| float2 samplePos = uv * texSize; |
| ; This is intended for long-running leaf funcs that don't use XMM registers, | |
| ; and just saves all callee-save registers regardless of whether they're used | |
| ; or not. | |
| ; detect some parameters from output format | |
| %ifidn __OUTPUT_FORMAT__,win32 | |
| %define resp resd | |
| %define LEADING_UNDERSCORES | |
| %define CALLEE_SAVE_GPRS ebp,ebx,esi,edi | |
| %define BYTES_PER_ARG 4 |
| // half->float variants. | |
| // by Fabian "ryg" Giesen. | |
| // | |
| // I hereby place this code in the public domain. | |
| // | |
| // half_to_float_fast: table based | |
| // tables could be done in a more compact fashion (in particular, can store tab2 in low word of tab1!) | |
| // but something of a dead end since not very SIMD-friendly. pretty much abandoned at this point. | |
| // | |
| // half_to_float_fast2: use FP adder hardware to deal with denormals. |
| ; input: 4x F16 in XMM0 (low words of each DWord) | |
| ; original idea+implementation by Dean Macri | |
| ; WARNING: copy & pasted together from other code, this ver is untested!! | |
| ; though the original version was definitely correct. | |
| bits 32 | |
| section .data |