This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ported from Marian-NMT | |
# https://github.com/marian-nmt/marian-dev/blob/8fbfa656/src/tensors/gpu/tensor_operators.cu#L206-L320 | |
# licensed under MIT | |
using CUDAdrv | |
using CUDAnative | |
using BenchmarkTools | |
const MAX_THREADS = 256 # seems to work best (1024 max) | |
const MAX_BLOCKS = 2^31 - 1 # benchmark only exercises 2048 |