Skip to content

Instantly share code, notes, and snippets.

@migerh
Created October 10, 2011 17:08
Show Gist options
  • Select an option

  • Save migerh/1275805 to your computer and use it in GitHub Desktop.

Select an option

Save migerh/1275805 to your computer and use it in GitHub Desktop.
cuda c example matrix addition with #blocks >= 1
// Kernel definition
__global__ void MatAdd(float A[N][N], float B[N][N], float C[N][N]) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
if (i < N && j < N)
C[i][j] = A[i][j] + B[i][j];
}
int main() {
// Kernel invocation
dim3 threadsPerBlock(16, 16);
dim3 numBlocks(N / threadsPerBlock.x, N / threadsPerBlock.y);
MatAdd<<<numBlocks, threadsPerBlock>>>(A, B, C);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment