This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;; hash based on fp multiply | |
;; first we need to make our integers into normal floats (since denormals cost on skx) | |
;; we can vcvtuqq2pd to get a normal float incorporating at least the top 53 bits of the input (todo could maybe get 54 bits with signed vcvtqq2pd; worth looking into?); it remains to ensure we take account of the low 11 bits | |
;; we handle three vectors at a time, as follows | |
;; a: oxxx oxxx oxxx oxxx oxxx oxxx oxxx oxxx | |
;; b: oxxx oxxx oxxx oxxx oxxx oxxx oxxx oxxx | |
;; c: oxxx oxxx oxxx oxxx oxxx oxxx oxxx oxxx | |
;; | |
;; d: ooo- ooo- ooo- ooo- ooo- ooo- ooo- ooo- | |
;; a, b, and c are the inputs; the xes are the 16-bit quantities that vcvtuqq2pd incorporates; the os are the 16-bit quantities which it does not; we want to pack the os into d |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// aarch64 assumed, but reasonably general | |
// x0: address of object to be initialised; x1, x2, x3, values to use to initialise its first three slots | |
// we want to ensure no other thread ever sees an unitialised x0 | |
// the dumb way: fence | |
str x1, [x0] | |
str x2, [x0, 8] | |
str x3, [x0, 16] | |
dmb ishst |