Fix incorrect temp elision for new-style (NEP 43) user-defined dtypes

Summary

can_elide_temp() in temp_elide.c incorrectly identifies new-style user-defined dtypes as numeric types eligible for in-place buffer reuse. For large arrays this silently rewrites a*a + b*b into (a*a) += (b*b), which raises a TypeError when the result dtype of the in-place add does not match the pre-allocated buffer.

Root cause

The eligibility check in can_elide_temp() contains:

!PyArray_ISNUMBER(alhs) ||

PyArray_ISNUMBER expands to PyTypeNum_ISNUMBER(PyArray_TYPE(arr)), and the macro is defined in ndarraytypes.h as:

#define PyTypeNum_ISNUMBER(type) \
    ((type) <= NPY_CLONGDOUBLE || (type) == NPY_HALF)

New-style user dtypes created via the NEP 43 API are intentionally assigned type_num = -1 (the value checked by NPY_DT_is_user_defined). In C, -1 is a signed integer, so -1 <= NPY_CLONGDOUBLE (i.e. -1 <= 15) evaluates to true. The guard therefore passes for all new-style user dtypes, regardless of whether their binary operations are type-preserving.

Conditions required to trigger the bug

All of the following must hold simultaneously:

New-style (NEP 43) user dtype — the array has type_num == -1.
Non-type-preserving binary operations — the dtype's multiply (or another binary op) returns a result whose dtype differs from either input dtype. In my case this is a parametric fixed-point dtype: multiplying two N-bit values produces a 2N-bit result, and adding two values with different fractional widths produces a result one integer bit wider than the wider operand. The important point is that result_dtype(T, T) != T.
Large arrays — the intermediate temporary must be at least NPY_MIN_ELIDE_BYTES (256 KiB) in size.
Call from the Python interpreter — check_callers() must succeed (i.e. the expression must be evaluated from Python, not from a C extension).

A typical triggering expression is:

power = re * re + im * im   # re, im are large arrays of a user dtype

For small arrays, or when the intermediate is stored in a named variable first, the bug does not trigger.

What goes wrong

When all conditions are met, can_elide_temp() returns 1 for the temporary produced by re * re. NumPy then calls the in-place add operator, effectively rewriting the expression as:

power = (re * re).__iadd__(im * im)

The temporary re * re has dtype T_mul (the multiply result type). The correct result dtype for T_mul + T_mul is T_add, which is a different dtype object. NumPy's ufunc machinery passes the temporary's buffer as the pre-specified output array, but resolve_descriptors for the add ufunc rejects it because T_mul != T_add. The in-place operation raises a TypeError; because try_binary_elide() returns 1 unconditionally regardless of whether inplace_op succeeded, the error propagates and there is no fallback to the non-eliding path.

The error message seen by the user is:

TypeError: add: user-specified output dtype does not match the expected result dtype

even though the user never specified an output dtype.

Fix

Add a PyArray_TYPE(alhs) < 0 guard in both can_elide_temp() and can_elide_temp_unary() in temp_elide.c:

/* can_elide_temp(), line ~313 */
if (!check_unique_temporary(olhs) ||
        !PyArray_CheckExact(olhs) ||
        !PyArray_ISNUMBER(alhs) || PyArray_TYPE(alhs) < 0 ||
        ...

/* can_elide_temp_unary(), line ~392 */
if (!check_unique_temporary((PyObject *)m1) ||
        !PyArray_CheckExact(m1) ||
        !PyArray_ISNUMBER(m1) || PyArray_TYPE(m1) < 0 ||
        ...

The comment above can_elide_temp() already states that elision is only safe for types "that do not change types" — user-defined parametric dtypes do not satisfy this property in general. A type_num < 0 check is the minimal, correct guard: it is consistent with how NPY_DT_is_user_defined identifies these dtypes elsewhere in NumPy, it adds no overhead for the common built-in type path, and it requires no knowledge of any particular user dtype's semantics.

An alternative fix would be to make PyTypeNum_ISNUMBER treat negative values as non-numeric:

#define PyTypeNum_ISNUMBER(type) \
    ((type) >= 0 && ((type) <= NPY_CLONGDOUBLE || (type) == NPY_HALF))

However, patching the macro would affect every caller and deserves broader review; the targeted guard in temp_elide.c is the safer minimal change for a bug fix.

MaartenBaert/numpy_fix_temp_elision.md

Select an option

No results found