- Bluenoise in the game INSIDE (dithering, raymarching, reflections)
- Dithering, Ray marching, shadows etc
- A Survery of Blue Noise and Its Applications
- Moments In Graphics (void-and-cluster)
- Bart Wronski Implementation of Solid Angle algorithm
; input: 4x F16 in XMM0 (low words of each DWord) | |
; original idea+implementation by Dean Macri | |
; WARNING: copy & pasted together from other code, this ver is untested!! | |
; though the original version was definitely correct. | |
bits 32 | |
section .data |
// half->float variants. | |
// by Fabian "ryg" Giesen. | |
// | |
// I hereby place this code in the public domain. | |
// | |
// half_to_float_fast: table based | |
// tables could be done in a more compact fashion (in particular, can store tab2 in low word of tab1!) | |
// but something of a dead end since not very SIMD-friendly. pretty much abandoned at this point. | |
// | |
// half_to_float_fast2: use FP adder hardware to deal with denormals. |
; This is intended for long-running leaf funcs that don't use XMM registers, | |
; and just saves all callee-save registers regardless of whether they're used | |
; or not. | |
; detect some parameters from output format | |
%ifidn __OUTPUT_FORMAT__,win32 | |
%define resp resd | |
%define LEADING_UNDERSCORES | |
%define CALLEE_SAVE_GPRS ebp,ebx,esi,edi | |
%define BYTES_PER_ARG 4 |
// The following code is licensed under the MIT license: https://gist.github.com/TheRealMJP/bc503b0b87b643d3505d41eab8b332ae | |
// Samples a texture with Catmull-Rom filtering, using 9 texture fetches instead of 16. | |
// See http://vec3.ca/bicubic-filtering-in-fewer-taps/ for more details | |
float4 SampleTextureCatmullRom(in Texture2D<float4> tex, in SamplerState linearSampler, in float2 uv, in float2 texSize) | |
{ | |
// We're going to sample a a 4x4 grid of texels surrounding the target UV coordinate. We'll do this by rounding | |
// down the sample location to get the exact center of our "starting" texel. The starting texel will be at | |
// location [1, 1] in the grid, where [0, 0] is the top left corner. | |
float2 samplePos = uv * texSize; |
http://www.anandtech.com/show/11170/the-amd-zen-and-ryzen-7-review-a-deep-dive-on-1800x-1700x-and-1700/6 | |
"The High-Level Zen Overview" | |
- "Features such as the micro-op cache help most instruction streams improve in performance and bypass parts of potentially | |
long-cycle repetitive operations, but also the larger dispatch, larger retire, larger schedulers and better branch | |
prediction means that higher throughput can be maintained longer and in the fastest order possible." | |
Micro-op caches have nothing to with "bypassing parts of potentially long-cycle repetitive operations" (what does | |
that even mean?). They reduce decode bottlenecks and decrease power consumption. Depending on the implementation, |
In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader | |
group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile. | |
Simplified HLSL code looks like this: | |
Buffer<float4> lightDatas; | |
Texture2D<uint2> lightStartCounts; | |
RWTexture2D<float4> output; | |
[numthreads(8, 8, 1)] |