Skip to content

Instantly share code, notes, and snippets.

@engie
Created April 21, 2012 19:55
Show Gist options
  • Save engie/2439312 to your computer and use it in GitHub Desktop.
Save engie/2439312 to your computer and use it in GitHub Desktop.
SSE intrinsic matrix * vector product
const double* b_p = &(B.data()[0]);
double* w_p = &(W.data()[0]);
for( uint32 i = 0; i < rows; i++ ) //For each row
{
double* w_p_row = w_p;
typedef double v2df __attribute__ ((vector_size (16))); //Create a type to store a pair of accumulators
v2df sum = { 0, 0 };
//Process two pairs at a time
for( uint32 j = 0; j < cols; j += 2 )
{
//Load 2 pairs of doubles from the matrix and the vector with loadupd
//Multiply them with mulpd
//Add the results to the two accumulators with addpd
//Store the result in the accumulators
sum = __builtin_ia32_addpd( sum, __builtin_ia32_mulpd( __builtin_ia32_loadupd( b_p ), __builtin_ia32_loadupd( w_p_row ) ) );
//As we're processing 2 at a time, double-advance the pointers
b_p += 2;
w_p_row += 2;
}
//Move the two accumulators out to an array in memory
double sum_alias[2];
__builtin_ia32_storeupd( sum_alias, sum );
//Sum the accumulators to get the result
myX[i] = sum_alias[0] + sum_alias[1];
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment