PyArray_NewFromDescr_int in ctors.c leaks the data buffer when a
user-defined DType's fill-zero loop raises an error.
This affects any DType that defines a get_fill_zero_loop that can fail —
for example, a fixed-point DType whose fill-zero rejects zero because it
falls outside the type's representable range.
data = PyDataMem_UserNEW(nbytes, fa->mem_handler); // allocate buffer
fill_zero_info.func(..., data, ...); // RAISES → goto fail
// fa->data is still NULL at this point
// fa->flags does not have NPY_ARRAY_OWNDATA
fail:
Py_XDECREF(fa->mem_handler); // release mem_handler
Py_DECREF(fa); // → array_dealloc → _clear_array_attributes
// sees OWNDATA is not set → skips data free
// *** data buffer is leaked ***
Because fa->data and NPY_ARRAY_OWNDATA were only set after the
fill-zero loop, an error in that loop meant the fail path had no way to
know that data had been allocated and needed freeing.
Moving these two assignments to right after the successful
PyDataMem_UserNEW / PyDataMem_UserNEW_ZEROED call — before the
fill-zero loop — ensures that if the fill-zero loop raises, the array
object already knows it owns the buffer. The subsequent Py_DECREF(fa) in
the fail path will then trigger _clear_array_attributes, which sees
OWNDATA set and properly frees the buffer via PyDataMem_UserFREE.
As a secondary cleanup, fa->data = data for the else branch (externally
provided data) is moved inside that branch, since it is logically part of
the "data was passed in" case and makes the ownership semantics of each
branch self-contained.
The original fail path did Py_XDECREF(fa->mem_handler) before
Py_DECREF(fa). This was needed because, in the original code,
_clear_array_attributes only cleared mem_handler inside the
OWNDATA && data block — so if OWNDATA wasn't set (or data was NULL),
mem_handler would never be released by dealloc.
With the fix, OWNDATA and data are now set before the fail path is
reachable (for the allocation branch), so _clear_array_attributes will
enter the OWNDATA block and release mem_handler. Keeping the explicit
Py_XDECREF in the fail path would therefore double-decref mem_handler:
fail:
Py_XDECREF(fa->mem_handler); // DECREF #1
Py_DECREF(fa); // → _clear_array_attributes
// → Py_CLEAR(fa->mem_handler) ← DECREF #2
// *** double-free / use-after-free ***
Under Valgrind this manifests as memory corruption that eventually causes a
segfault during gc.collect() (invalid read in _PyObject_IS_GC /
visit_decref), since the corrupted allocator state leads to subsequent
np.zeros calls failing with _ArrayMemoryError, and freed objects
remaining in GC-tracked containers.
Removing the explicit Py_XDECREF is safe: dealloc now handles all cleanup.
In _clear_array_attributes, the Py_CLEAR(fa->mem_handler) was
previously inside the if ((fa->flags & NPY_ARRAY_OWNDATA) && fa->data)
block, meaning it only ran when the array both owned its data and had a
non-NULL data pointer.
This is fine for the normal (non-error) case, but there are fail paths in
PyArray_NewFromDescr_int where mem_handler has been set but the function
fails before reaching the OWNDATA assignment (e.g., the data == NULL
/ raise_memory_error path). In these paths, _clear_array_attributes
would skip the OWNDATA block entirely, leaking mem_handler.
The original code worked around this with the explicit
Py_XDECREF(fa->mem_handler) in the fail path, but that created the
double-decref problem described above.
Moving Py_CLEAR(fa->mem_handler) to run unconditionally (after the
OWNDATA block) means _clear_array_attributes always cleans up
mem_handler regardless of which error path was taken, eliminating the need
for any special-case cleanup in callers. Py_CLEAR is a no-op when the
pointer is already NULL, so this is safe for arrays that never had a
mem_handler set (e.g., arrays with externally provided data).