Those, frankly, are a very poor subset of shuffles.
The shuffle* primops exposed by GHC (for example shuffleFloatX4#) take runtime vector arguments that specify shuffle indices dynamically.
They correspond to register-controlled SIMD shuffle operations on x86.
Unfortunately, those are not efficient variants — they’re just the ones that cleanly map onto our current primop representation, and they often don't even exist on older CPUs.
Let’s take a dive through Intel’s “shuffle-like” operations.
Of these, the first group uses compile-time immediates to select shuffle positions, and thus are the ones we’d want to reach eventually, if we knew the constants involved at all.
| Instruction | Width | Control | Description |
|---|