I am trying to understand the lifecycle of the cuda graph memory pool, e.g., as interacted with in _cuda_releasePool
. In particular, the implementation of _use_cuda_memory_pool_manager
seems to suggest that the pool is released at the end of the context manager. But in the call sites of this function, it doesn't seem that the pool should be released (especially when we are allocating inputs for the cuda graph region). What is going on?
You're right to notice this apparent discrepancy. Let's break down what's happening with the CUDA graph memory pool lifecycle:
- The
_use_cuda_memory_pool_manager
context manager is indeed calling_cuda_releasePool
at the end of its context. However, this doesn't necessarily mean the pool is being destroyed or fully released. - The
_cuda_releasePool
function is part of the CUDA caching allocator's API, and its behavior is more nuanced than just destroying the pool:- It decrements the reference count for the pool.
- If the reference c