Last active
March 12, 2020 07:15
-
-
Save loopervfx/85d8bd8d362f08dee8bc65b0b30aef73 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
These generalistions about never using conditionals in GPU code are misleading, anyway. Just be careful about using conditionals | |
so that your thread groups are mostly either true or false. like checking a UV coord on a 2D frag/pixel shader | |
e.g. if (uv.x < 0.53) and branching in a clean cut through the texture or screenspace coordinates, is no problem on modern GPUs. | |
Because only a few thread groups in the dispatch "grid" might get held up --the ones that have a mix of true/false threads around | |
the 0.53 coord. | |
The situation to watch out for is doing something like, a conditional to check for even or odd texel coords or a conditional | |
with some pseudo random determination which means every thread group will have a heterogeneous mix of true/false | |
and every thread group in the dispatch will be held up taking the time to execute both conditions. | |
Even in that worst case scenario, if the operations are simple enough and the dispatch isn't gigantic it doesn't always matter | |
that much on modern GPUs. everyone should be cautious and use lookups, ifdef ands switch cases etc when it makes sense | |
but it's not always worth the few microseconds. (except for when it does, say with a giant shader dispatch, | |
or a slim mobile GPU performance budget, and meets the criteria above for thread branch heterogeneity, etc.) | |
If you did something like `if(fragcoord.x % 2) then execute these instructions, else these other instructions` then | |
sure it might become an issue at scale or sufficient cost in your branching code, say, a bunch of texture samples in each branch. | |
really simple value assignments and basic one line vector math operations aren't going to be that noticeable when branching though, | |
like the step function mentioned above. | |
And then if you use % or mod by themselves it shouldn't even branch at all because it's still the same instruction | |
executed for every thread. Don't quote me on this though i haven't examined the IR or disassembly to be 100% sure. | |
Small fast simple little branches at smaller scales don't matter that much. especially if they mostly branch into | |
large homogeneous groups, equal to or greater then the thread group size. | |
Big long complex or expensive branches in large shaders dispatches, and/or with too much heterogeneity occurring smaller | |
than the thread group size is what to watch out for. | |
Of course, there are many other factors like comparing floats vs ints and uints, bools and relational operators, | |
and other considerations I'm sure I have not included here. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment