can_elide_temp() in temp_elide.c incorrectly identifies new-style user-defined
dtypes as numeric types eligible for in-place buffer reuse. For large arrays this
silently rewrites a*a + b*b into (a*a) += (b*b), which raises a TypeError
when the result dtype of the in-place add does not match the pre-allocated buffer.
The eligibility check in can_elide_temp() contains:
!PyArray_ISNUMBER(alhs) ||PyArray_ISNUMBER expands to PyTypeNum_ISNUMBER(PyArray_TYPE(arr)), and the
macro is defined in ndarraytypes.h as:
#define PyTypeNum_ISNUMBER(type) \
((type) <= NPY_CLONGDOUBLE || (type) == NPY_HALF)New-style user dtypes created via the NEP 43 API are intentionally assigned
type_num = -1 (the value checked by NPY_DT_is_user_defined). In C, -1 is a
signed integer, so -1 <= NPY_CLONGDOUBLE (i.e. -1 <= 15) evaluates to true.
The guard therefore passes for all new-style user dtypes, regardless of whether
their binary operations are type-preserving.
All of the following must hold simultaneously:
- New-style (NEP 43) user dtype — the array has
type_num == -1. - Non-type-preserving binary operations — the dtype's
multiply(or another binary op) returns a result whose dtype differs from either input dtype. In my case this is a parametric fixed-point dtype: multiplying twoN-bit values produces a2N-bit result, and adding two values with different fractional widths produces a result one integer bit wider than the wider operand. The important point is thatresult_dtype(T, T) != T. - Large arrays — the intermediate temporary must be at least
NPY_MIN_ELIDE_BYTES(256 KiB) in size. - Call from the Python interpreter —
check_callers()must succeed (i.e. the expression must be evaluated from Python, not from a C extension).
A typical triggering expression is:
power = re * re + im * im # re, im are large arrays of a user dtypeFor small arrays, or when the intermediate is stored in a named variable first, the bug does not trigger.
When all conditions are met, can_elide_temp() returns 1 for the temporary
produced by re * re. NumPy then calls the in-place add operator, effectively
rewriting the expression as:
power = (re * re).__iadd__(im * im)The temporary re * re has dtype T_mul (the multiply result type). The
correct result dtype for T_mul + T_mul is T_add, which is a different dtype
object. NumPy's ufunc machinery passes the temporary's buffer as the pre-specified
output array, but resolve_descriptors for the add ufunc rejects it because
T_mul != T_add. The in-place operation raises a TypeError; because
try_binary_elide() returns 1 unconditionally regardless of whether inplace_op
succeeded, the error propagates and there is no fallback to the non-eliding path.
The error message seen by the user is:
TypeError: add: user-specified output dtype does not match the expected result dtype
even though the user never specified an output dtype.
Add a PyArray_TYPE(alhs) < 0 guard in both can_elide_temp() and
can_elide_temp_unary() in temp_elide.c:
/* can_elide_temp(), line ~313 */
if (!check_unique_temporary(olhs) ||
!PyArray_CheckExact(olhs) ||
!PyArray_ISNUMBER(alhs) || PyArray_TYPE(alhs) < 0 ||
...
/* can_elide_temp_unary(), line ~392 */
if (!check_unique_temporary((PyObject *)m1) ||
!PyArray_CheckExact(m1) ||
!PyArray_ISNUMBER(m1) || PyArray_TYPE(m1) < 0 ||
...The comment above can_elide_temp() already states that elision is only safe for
types "that do not change types" — user-defined parametric dtypes do not satisfy
this property in general. A type_num < 0 check is the minimal, correct guard:
it is consistent with how NPY_DT_is_user_defined identifies these dtypes
elsewhere in NumPy, it adds no overhead for the common built-in type path, and it
requires no knowledge of any particular user dtype's semantics.
An alternative fix would be to make PyTypeNum_ISNUMBER treat negative values as
non-numeric:
#define PyTypeNum_ISNUMBER(type) \
((type) >= 0 && ((type) <= NPY_CLONGDOUBLE || (type) == NPY_HALF))However, patching the macro would affect every caller and deserves broader review;
the targeted guard in temp_elide.c is the safer minimal change for a bug fix.