Last active
December 30, 2024 10:01
-
Star
(104)
You must be signed in to star a gist -
Fork
(4)
You must be signed in to fork a gist
-
-
Save TheRealMJP/c83b8c0f46b63f3a88a5986f4fa982b1 to your computer and use it in GitHub Desktop.
An HLSL function for sampling a 2D texture with Catmull-Rom filtering, using 9 texture samples instead of 16
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// The following code is licensed under the MIT license: https://gist.github.com/TheRealMJP/bc503b0b87b643d3505d41eab8b332ae | |
// Samples a texture with Catmull-Rom filtering, using 9 texture fetches instead of 16. | |
// See http://vec3.ca/bicubic-filtering-in-fewer-taps/ for more details | |
float4 SampleTextureCatmullRom(in Texture2D<float4> tex, in SamplerState linearSampler, in float2 uv, in float2 texSize) | |
{ | |
// We're going to sample a a 4x4 grid of texels surrounding the target UV coordinate. We'll do this by rounding | |
// down the sample location to get the exact center of our "starting" texel. The starting texel will be at | |
// location [1, 1] in the grid, where [0, 0] is the top left corner. | |
float2 samplePos = uv * texSize; | |
float2 texPos1 = floor(samplePos - 0.5f) + 0.5f; | |
// Compute the fractional offset from our starting texel to our original sample location, which we'll | |
// feed into the Catmull-Rom spline function to get our filter weights. | |
float2 f = samplePos - texPos1; | |
// Compute the Catmull-Rom weights using the fractional offset that we calculated earlier. | |
// These equations are pre-expanded based on our knowledge of where the texels will be located, | |
// which lets us avoid having to evaluate a piece-wise function. | |
float2 w0 = f * (-0.5f + f * (1.0f - 0.5f * f)); | |
float2 w1 = 1.0f + f * f * (-2.5f + 1.5f * f); | |
float2 w2 = f * (0.5f + f * (2.0f - 1.5f * f)); | |
float2 w3 = f * f * (-0.5f + 0.5f * f); | |
// Work out weighting factors and sampling offsets that will let us use bilinear filtering to | |
// simultaneously evaluate the middle 2 samples from the 4x4 grid. | |
float2 w12 = w1 + w2; | |
float2 offset12 = w2 / (w1 + w2); | |
// Compute the final UV coordinates we'll use for sampling the texture | |
float2 texPos0 = texPos1 - 1; | |
float2 texPos3 = texPos1 + 2; | |
float2 texPos12 = texPos1 + offset12; | |
texPos0 /= texSize; | |
texPos3 /= texSize; | |
texPos12 /= texSize; | |
float4 result = 0.0f; | |
result += tex.SampleLevel(linearSampler, float2(texPos0.x, texPos0.y), 0.0f) * w0.x * w0.y; | |
result += tex.SampleLevel(linearSampler, float2(texPos12.x, texPos0.y), 0.0f) * w12.x * w0.y; | |
result += tex.SampleLevel(linearSampler, float2(texPos3.x, texPos0.y), 0.0f) * w3.x * w0.y; | |
result += tex.SampleLevel(linearSampler, float2(texPos0.x, texPos12.y), 0.0f) * w0.x * w12.y; | |
result += tex.SampleLevel(linearSampler, float2(texPos12.x, texPos12.y), 0.0f) * w12.x * w12.y; | |
result += tex.SampleLevel(linearSampler, float2(texPos3.x, texPos12.y), 0.0f) * w3.x * w12.y; | |
result += tex.SampleLevel(linearSampler, float2(texPos0.x, texPos3.y), 0.0f) * w0.x * w3.y; | |
result += tex.SampleLevel(linearSampler, float2(texPos12.x, texPos3.y), 0.0f) * w12.x * w3.y; | |
result += tex.SampleLevel(linearSampler, float2(texPos3.x, texPos3.y), 0.0f) * w3.x * w3.y; | |
return result; | |
} |
Thanks guys! I updated the code with the optimizations.
Wouldn't this be more optimal with use of Gather()?
https://docs.microsoft.com/en-us/windows/desktop/direct3dhlsl/dx-graphics-hlsl-to-gather
If you are doing the filtering yourself and you want to use a linear buffer, you can use rawBuffer0.Load4()
coherency might or might not be worse, it depends. Dynamic updates are usually easier.
For the 5 taps should we renormalize weights?
float weight = w12.x * w0.y + w0.x * w12.y + w12.x * w12.y + w3.x * w12.y + w12.x * w3.y;
result /= weight;
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Alternatively putting the polynomials straight in horner-form:
Pyramid, AMDDXX, Bonaire ( http://pastebin.com/12ccE9Lk )
VGPRs: 55 -> 47
VALU: 146 -> 135