All benchmarks are composed of 10 batches of 2-dimensional matrices, with sizes varying from 128x128 to 4096x4096 with single-precision.
Matrix dimensions: 128x128 In-place C2C FFT time for 10 runs: 0.538662 ms
| (lldb) | |
| There is a running process, detach from it and attach?: [Y/n] | |
| Process 6843 detached | |
| _bt.cpython-37m-darwin.so was compiled with optimization - stepping may behave oddly; variables may not be available. | |
| Process 6843 stopped | |
| * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP | |
| frame #0: 0x0000000114b54ab0 _bt.cpython-37m-darwin.so`backtrace_thread [inlined] _wait_and_reset_signal at bt.c:182:35 [opt] | |
| 179 static | |
| 180 void _wait_and_reset_signal(struct sigaction *old_sa) { | |
| 181 // spin and wait. |
| FROM python:3 | |
| ENV PYTHON 3.7 | |
| ENV NUMPY 1.16.2 | |
| ENV UPSTREAM_DEV 1 | |
| ENV TEST true | |
| ENV LINT true | |
| ENV COVERAGE false | |
| ENV PARALLEL true | |
| ENV XTRATESTARGS '' |
| from dask.distributed import Client | |
| from dask_cuda import LocalCUDACluster | |
| from dask.array.utils import assert_eq | |
| import dask.array as da | |
| import cupy as cp | |
| add_broadcast_kernel = cp.RawKernel( | |
| r''' | |
| extern "C" __global__ |