This is a little writeup on some anticipatory code to eventually test and benchmark on the upcoming Intel Icelake architecture.
The pext
instruction is a particularly useful instruction in BMI2 that allows the programmer to provide
a bit-mask integer with 1
bits set in positions of interests for which the
pext
instruction will extract these bits in parallel and compact them all against the least-significnat bits.
Given a bitmask and an input, pext will select the bits where-ever there is a
set bit in the mask, and compress them together to produce a new result.
|0000100000001111100000100001111100010000000010000001001000010000| < Operand A