Created
June 15, 2012 15:40
-
-
Save emoon/2937100 to your computer and use it in GitHub Desktop.
selb in odd pipe
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Challenge: | |
Implement selb with only using odd instructions on SPU. | |
---------------------------------------------------------------------------------- | |
input : mask (comes from a floating point compare so result is always zero or ones for each 32-bit value (and 4 of them)) | |
a, b to select between | |
---------------------------------------------------------------------------------- | |
Suggestion 1 by @daniel_collin | |
---------------------------------------------------------------------------------- | |
// 18 cycles latency | |
gb t, mask // 4 | |
rotqbii offset, t, 4 // 4 | |
lqx shufb_mask, offset, shuffle_table // 6 | |
shufb res, a, b, shufb_mask // 4 | |
---------------------------------------------------------------------------------- | |
Suggestion 2 by @postgoodism | |
---------------------------------------------------------------------------------- | |
SPU selb using only odd instructions | |
(with details elided because I'm only half-awake) | |
Given a selb mask: | |
v1 = FF00FFFF 0000FFFF 00FF0000 FFFFFFFF | |
SHUFB v1 with a qword of zeros, using v1 as the shuffle mask. | |
v2 = 80008080 00008080 00800000 80808080 | |
Rotate v2 to the right by 7 bits with a ROTQMBII (or is it ROTQMBYBI? can't ever remember without a cheat sheet) | |
v3 = 01000101 00000101 00010000 01010101 | |
Broadcast the bytes of v3 into two new qwords v4 and v5 using SHUFB, interleaved with bytes from the following constant k1: | |
k1 = 00102030 40506070 8090A0B0 C0D0E0F0 | |
v4 = 01000010 01200130 00400050 01600170 | |
v5 = 00800190 00A000B0 01C001D0 01E001F0 | |
Rotate v4 and v5 right by 4 bits using ROTQMBII/ROTQMBYBI to create v6/v7 | |
v6 = 00100001 00120013 00040005 00160017 | |
v7 = 00080019 000A000B 001C001D 001E001F | |
Re-combine v6 and v7 into v8 using shufb, taking only the even-numbered bytes from each: | |
v8 = 10011213 04051617 08190A0B 1C1D1E1F | |
v8 is a shufb mask that replicates the original selb mask |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment