Last active
December 2, 2022 04:23
-
-
Save lebedov/8514d3456a94a6c73e6d to your computer and use it in GitHub Desktop.
Demo of how to pass GPU memory managed by pycuda to mpi4py.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
""" | |
Demo of how to pass GPU memory managed by pycuda to mpi4py. | |
Notes | |
----- | |
This code can be used to perform peer-to-peer communication of data via | |
NVIDIA's GPUDirect technology if mpi4py has been built against a | |
CUDA-enabled MPI implementation. | |
""" | |
import atexit | |
import sys | |
# PyCUDA 2014.1 and later have built-in support for wrapping GPU memory with a | |
# buffer interface: | |
import pycuda | |
if pycuda.VERSION >= (2014, 1): | |
bufint = lambda arr: arr.gpudata.as_buffer(arr.nbytes) | |
else: | |
import cffi | |
ffi = cffi.FFI() | |
bufint = lambda arr: ffi.buffer(ffi.cast('void *', arr.ptr), arr.nbytes) | |
import numpy as np | |
from mpi4py import MPI | |
import pycuda.driver as drv | |
import pycuda.gpuarray as gpuarray | |
drv.init() | |
def dtype_to_mpi(t): | |
if hasattr(MPI, '_typedict'): | |
mpi_type = MPI._typedict[np.dtype(t).char] | |
elif hasattr(MPI, '__TypeDict__'): | |
mpi_type = MPI.__TypeDict__[np.dtype(t).char] | |
else: | |
raise ValueError('cannot convert type') | |
return mpi_type | |
comm = MPI.COMM_WORLD | |
size = comm.Get_size() | |
rank = comm.Get_rank() | |
N_gpu = drv.Device(0).count() | |
if N_gpu < 2: | |
sys.stdout.write('at least 2 GPUs required') | |
else: | |
dev = drv.Device(rank) | |
ctx = dev.make_context() | |
atexit.register(ctx.pop) | |
atexit.register(MPI.Finalize) | |
if rank == 0: | |
x_gpu = gpuarray.arange(100, 200, 10, dtype=np.double) | |
print ('before (%i): ' % rank)+str(x_gpu) | |
comm.Send([bufint(x_gpu), dtype_to_mpi(x_gpu.dtype)], dest=1) | |
print 'sent' | |
print ('after (%i): ' % rank)+str(x_gpu) | |
elif rank == 1: | |
x_gpu = gpuarray.zeros(10, dtype=np.double) | |
print ('before (%i): ' % rank)+str(x_gpu) | |
comm.Recv([bufint(x_gpu), dtype_to_mpi(x_gpu.dtype)], source=0) | |
print 'received' | |
print ('after (%i): ' % rank)+str(x_gpu) |
Thanks! But why will ffi.cast('void *', arr.ptr) work at GPU? And it seems, that to_buffer won't work, because plain mpi requires objects which support the buffer protocol in host memory. A DeviceAllocation is in device memory. You can read the as_buffer documentation and its source ( https://github.com/inducer/pycuda/blob/fde69b0502d944a2d41e1f1b2d0b78352815d487/src/cpp/cuda.hpp#L1547 ). I don't see anywhere where a device to host copy would be initiated by creating a buffer object from a DeviceAllocation. Is this example provided to demonstrate how to copy via GPU-to-host copying?
Thank you!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
MPI.Finalize()
must be explicitly called on exit before PyCUDA cleanup to prevent errors. See this thread for more information.