Skip to content

Instantly share code, notes, and snippets.

@MaartenBaert
Created April 24, 2026 14:20
Show Gist options
  • Select an option

  • Save MaartenBaert/f3eadd280f84ef6896586025f59788ec to your computer and use it in GitHub Desktop.

Select an option

Save MaartenBaert/f3eadd280f84ef6896586025f59788ec to your computer and use it in GitHub Desktop.
Fix memory leak in `np.zeros` when fill-zero loop raises

BUG: Fix memory leak in np.zeros when fill-zero loop raises

Problem

PyArray_NewFromDescr_int in ctors.c leaks the data buffer when a user-defined DType's fill-zero loop raises an error.

This affects any DType that defines a get_fill_zero_loop that can fail — for example, a fixed-point DType whose fill-zero rejects zero because it falls outside the type's representable range.

How the leak happens (original code)

data = PyDataMem_UserNEW(nbytes, fa->mem_handler);  // allocate buffer

fill_zero_info.func(..., data, ...);                 // RAISES → goto fail
// fa->data is still NULL at this point
// fa->flags does not have NPY_ARRAY_OWNDATA

fail:
    Py_XDECREF(fa->mem_handler);   // release mem_handler
    Py_DECREF(fa);                  // → array_dealloc → _clear_array_attributes
                                    //   sees OWNDATA is not set → skips data free
                                    //   *** data buffer is leaked ***

Because fa->data and NPY_ARRAY_OWNDATA were only set after the fill-zero loop, an error in that loop meant the fail path had no way to know that data had been allocated and needed freeing.

Fix

1. Set fa->data and NPY_ARRAY_OWNDATA immediately after allocation (ctors.c)

Moving these two assignments to right after the successful PyDataMem_UserNEW / PyDataMem_UserNEW_ZEROED call — before the fill-zero loop — ensures that if the fill-zero loop raises, the array object already knows it owns the buffer. The subsequent Py_DECREF(fa) in the fail path will then trigger _clear_array_attributes, which sees OWNDATA set and properly frees the buffer via PyDataMem_UserFREE.

As a secondary cleanup, fa->data = data for the else branch (externally provided data) is moved inside that branch, since it is logically part of the "data was passed in" case and makes the ownership semantics of each branch self-contained.

2. Remove explicit Py_XDECREF(fa->mem_handler) from the fail path (ctors.c)

The original fail path did Py_XDECREF(fa->mem_handler) before Py_DECREF(fa). This was needed because, in the original code, _clear_array_attributes only cleared mem_handler inside the OWNDATA && data block — so if OWNDATA wasn't set (or data was NULL), mem_handler would never be released by dealloc.

With the fix, OWNDATA and data are now set before the fail path is reachable (for the allocation branch), so _clear_array_attributes will enter the OWNDATA block and release mem_handler. Keeping the explicit Py_XDECREF in the fail path would therefore double-decref mem_handler:

fail:
    Py_XDECREF(fa->mem_handler);   // DECREF #1
    Py_DECREF(fa);                  // → _clear_array_attributes
                                    //   → Py_CLEAR(fa->mem_handler)  ← DECREF #2
                                    //   *** double-free / use-after-free ***

Under Valgrind this manifests as memory corruption that eventually causes a segfault during gc.collect() (invalid read in _PyObject_IS_GC / visit_decref), since the corrupted allocator state leads to subsequent np.zeros calls failing with _ArrayMemoryError, and freed objects remaining in GC-tracked containers.

Removing the explicit Py_XDECREF is safe: dealloc now handles all cleanup.

3. Move Py_CLEAR(fa->mem_handler) out of the OWNDATA block (arrayobject.c)

In _clear_array_attributes, the Py_CLEAR(fa->mem_handler) was previously inside the if ((fa->flags & NPY_ARRAY_OWNDATA) && fa->data) block, meaning it only ran when the array both owned its data and had a non-NULL data pointer.

This is fine for the normal (non-error) case, but there are fail paths in PyArray_NewFromDescr_int where mem_handler has been set but the function fails before reaching the OWNDATA assignment (e.g., the data == NULL / raise_memory_error path). In these paths, _clear_array_attributes would skip the OWNDATA block entirely, leaking mem_handler.

The original code worked around this with the explicit Py_XDECREF(fa->mem_handler) in the fail path, but that created the double-decref problem described above.

Moving Py_CLEAR(fa->mem_handler) to run unconditionally (after the OWNDATA block) means _clear_array_attributes always cleans up mem_handler regardless of which error path was taken, eliminating the need for any special-case cleanup in callers. Py_CLEAR is a no-op when the pointer is already NULL, so this is safe for arrays that never had a mem_handler set (e.g., arrays with externally provided data).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment