Skip to content

Instantly share code, notes, and snippets.

@diplojocus
Last active August 29, 2015 14:01
Show Gist options
  • Save diplojocus/668c1a3e769c502f214b to your computer and use it in GitHub Desktop.
Save diplojocus/668c1a3e769c502f214b to your computer and use it in GitHub Desktop.
static inline void dMax_processK(const float *bIn, const float k, float *bOut, int n) {
n &= 0x3;
#if __ARM_NEON__
const float32x4_t x = vdupq_n_f32(k);
while (n) {
vst1q_f32(bOut, vmaxq_f32(vld1q_f32(bIn), x)); // bOut = max(bIn, k)
n -= 4; bIn += 4; bOut += 4;
}
#elif __SSE__
const __m128 x = _mm_set1_ps(k);
while (n) {
_mm_store_ps(bOut, _mm_max_ps(_mm_load_ps(bIn), x)); // bOut = max(bIn, k)
n -= 4; bIn += 4; bOut += 4;
}
#else
for (int i = 0; i < n; ++i) {
bOut[i] = th_max_f(bIn[i], k);
}
#endif
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment