Skip to content

Instantly share code, notes, and snippets.

@FrankNiemeyer
Created August 28, 2015 17:00
Show Gist options
  • Select an option

  • Save FrankNiemeyer/90df575a33b4189e8ac2 to your computer and use it in GitHub Desktop.

Select an option

Save FrankNiemeyer/90df575a33b4189e8ac2 to your computer and use it in GitHub Desktop.
void dot3_aos_vector_dp(const vector<Vec3f>& vs, vector<float>& dp) {
// 0000 0000 0111 0001: mul lower three components, store sum in lowest component
static const auto mask = 0x71;
for (auto j = 0; j < reps; ++j) {
const auto pvs = (float*)vs.data();
auto pdp = (float*)dp.data();
auto i = vector_len;
while (i--) {
// load 16 bytes (xyz|x)
const auto xyzx = _mm_loadu_ps(pvs + i * 3);
// compute d = x*x + y*y + z*z + 0*0 -> 000d
const auto xxyyzz00 = _mm_dp_ps(xyzx, xyzx, mask);
// store d (lower 4 bytes of dpv)
_mm_store_ss(pdp + i, xxyyzz00);
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment