Sebastian's Philosophy of the Array Object Types and DTypes

This object is solely for me to organize my thoughts, it probably is a horrific thing if seen by a theoretical computer scientist, and gets a lot of the terminalogy wrong. Type probably usually means a theoretical type, which I think is defined as the set of all possible instances.

These are many concepts that many smarter people probably have written down before, so...

The Array object

The array object at its core is a python type which signals that operations are generalized elementwise operations (array-programming), unlike the normal Python operations which are scalar in nature.

We can thus argue that the NumPy array serves as the most common:

HomogeneousElementwiseContainer(ElementwiseContainer)

abstract type (the elementwise container could also be called an ensemble, see next paragraph). An object, that due to its array-programming nature, breaks some typical qualities of typical (scalar) python objects. For example it cannot define bool(), and == must return a new array of bools.

It may be prudent to think of a NumPy array as an ensemble of scalars/elements. For a classical container, the programmer thinks of a container object which contains elements. The array programmer thinks the opposite way: The elements are the fundamental objects, the "container" is just a name given to the ensemble of elements. The array programmer looks from the inside to the outside: They work on an ensemble of elements (without knowledge of how many or how they are organized/structure!) most of the time and only look outside to the container when the organization of the container is interesting. This ensemble s similar for example to a random variable, it is possible to do math with a random variable (the ensemble of all possible realizations of the random variable), without thinking of any specific drawn random number. Ensembles of realizations in (statistical) Physics are a similar concept.

The actual behaviour of the NumPy array is that of an N-dimensional, homogeneously shaped, integer indexed. Which further defines certain operations (such as reductions). It also adds an additional, possibly refinement, property to the above definition:

BroadcastingContainer(HomogeneousElementWiseContainer)

in that NumPy arrays have a logic which means that they will broadcast during elementwise operations. The way NumPy broadcasts has specific rules which could be refined in its own abstract container class. Broadcasting itself probably requires a few small theoretical rules to define, but they are not of much consequence here: NumPy broadcasting is well accepted currently, and generalizes easily for example to labelled axis (to be not very theoretical).

The method by which NumPy arrays are indexed or broadcast, are not very relavent in practice. It is sufficient to take away that: Typical caculations involving arrays should work on many elements at once

for speed reasons.

And that this requires to implement computational kernels which work on many homogeneous elements. In practice these kernels may be specific to NumPy arrays, and for example make use of its strided memory layout. The exact work that such a kernel is expected to do, and who defines it is in itself a more complex topic discussed in the below Kernels section.

I will also use array-like to identify objects that have compatible syntax to NumPy, which implies that they are at least close to a HomogeneousElementwiseContainer.

Mutability and Views

The ElementwiseContainer here is mutable, but could be generalized to distinguish "frozen" and "mutable" versions. Especially views are an important additional component of the array object in practice, and is related to mutability. Since the array object itself is mutable and has view semantics, changes can affect views even when the elements itself are considered immutable. This is most clear for a NumPy array containing Python immutable python objects.

Array of Arrays

One complication (and due to the way NumPy works a very real one) is that array programming does not generalize well to arrays of arrays. There is a general issue that an array of arrays always has to assume that another operand is interpreted as an array and not as a scalar. An array of arrays thus cannot have a scalar corresponding to its elements.

In NumPy this is also a problem for non-array containers. Even though they use scalar semantics in Python, they are considered array-likes by NumPy and coerced greedily.

`HomogeneousElementwiseContainer` Compared to Python Containers

As noted the NumPy array is a type which is distinct from most Python types due to its property of indicating array-programming, elementwise (operator) behaviour. NumPy arrays are both homogeneous and elementwise. So how does this compare to other python container types?

The homogeneous part is only relevant mainly as a contract to users: all elements are described by the same scalar type. Note that even Python lists are homogeneous (they contain only python objects), but lack the ability to provide the user with a more strict contract of the elements type. The python array.array actually allows for a homogeneous container, which is similar to C++ and other languages template (container) classes.

It is important to note that the above defined HomogenousElementwiseContainer does, however, not include Python lists or even Python arrays. With respect to array-programming principles, Python sequences have a scalar behaviour contract for operations such as ==, bool(), or +. These operations or methods may use the elements during their operations, but the result makes statements or modifications to the container itself:

Is the container equal to the other container? This implies elements are equal, but does provide information for each individual element.

Is the sequence "truthy" (i.e. not empty)? As opposed to whether each element included is truthy.

+ returns a new sequence that combines/concatenates both sequences. It thus mutates the container object, and again, not the elements.

To name just a few. Other operators such as operator.integral() or and are inherently scalar, and usually error when used on NumPy arrays. (Arguably, some of these should possibly be limited further.)

The important thing to note is, that in principle == could have scalar contract, while a new elementwise== would have to exist so that both scalar and array-programming could be defined on a single container. Some languages designed for array-programming do this. Python does not wish to do this, so the NumPy array acts as a type which indicates a different contract for operator definition.

There some cases where this breaks expectations, since users cannot easily know that they are dealing with a NumPy array and thus have a different contract of operation for most operators. Code written for generic Python containers will rarely generalize correctly to arrays. Short of making a full new set of operators, the only solution I can think of is adding a new ElementwiseContainer abstract base class to Python. This would users that run into issues with the broken contract of array-programming (such as == returning something with a "truthyness" defined), have a fighting chance to detect what is going on. (Note that in practice e.g. not a number (NaN) violate expectations of the == operator in different but comparable ways.)

Certainly some of these violations may be design flaws of the NumPy array. For example, the NumPy array iterates over the first axis instead of all elements by default (SymPy does it differently for example), thus mimicking a list of lists rather, while NumPy could instead mimic for example a mapping. However, the general convenience of elementwise operators must be accepted as a core design principle of NumPy.

Scalars, Array Scalars, Scalar Arrays, and pure Array Programming

The is an important distinction to note beforehand, that NumPy arrays are considered to have mutable content, but not mutable in their container properties (although this is not strictly true, this is problematic and can leas to bugs). Mutability thus refers to mutable content for arrays.

There are some core concepts regarding scalars and arrays, which are important in NumPy (and limit future choices somewhat). They delineate the (typical) python scalar, NumPy array scalar (which are immutable but mimick an array), scalar array (zero dimensional array), and a general N-dimensional array.

The following concepts are important:

Mutability:

Arrays are typically mutable independent of whether the elements themselves are considered mutable. Due to view semantics, this mutability affects not just the object itself.
Scalars (in python) are typically immutable objects, this also means that scalars are hashable.
The arrays content also inherits the mutability of its elements (scalars). Just like a tuple containing a mutable object can effectively change (the container can only be immutable if all its elements are).

Mutability is important, because a zero dimensional array is typically not mutable, while Python scalars typically are. Thus, scalar arrays, unlike scalars, are mutable objects and cannot be used for example as dictionary keys. Although for example astropy.unit.Quantity ignores this property (it does not define a scalar or array scalar).

Scalar vs. Elementwise Array-Programming

Some operations – mainly bool(), int() – are generally not defined on an array, because it is not consistent for a general number of elements. NumPy currently defines bool() for scalar arrays and array scalars, since there is only one element .any() and .all() always match in meaning.

Thus, scalar arrays have some freedom to behave like scalars. Although, arguably, they break the common principle of other "scalar" Python containers, which define bool() based on being empty or not. The definition of bool() is possible, the main question is whether it should be happy to discard the container information, when it normally is not. (I personally think, that an ideal NumPy would never allow bool() on the container. And I believe that users would not even notice for the most part. While it can be defined, it also discards container information, thus, if I were to add this, I would want to know that it causes real pain in code.)

#### Discussion

I personally would argue that only scalars and arrays are sensible concepts and NumPy should strive to remove array scalars and scalar arrays as special cases.

##### Scalar arrays

I believe that scalar arrays behaviour serves no purpose, except the appearance of convenience with little proof of it being true. They seemingly ignore their container properties, which should require very good motivation. We can (and are) trying to slowly remove some special cases, but historically they exist and I have to admit that these choice seems good on first sight. I.e. if you try to implement scalar (typical Python) behaviour as much as possible, you end up with current choices. However, I would argue that considering the limit of defining this behaviour, the default choice should be to refuse to implement them. (I could imagine that at the time, scalars did not exist for all dtypes, in which case the need for "making things work" is much higher.)

##### Array Scalars

Array scalars are more complex. The main reason they exist is for speed enhancement, I believe. NumPy currently converts 0-D arrays automatically to array scalars in many operations (implicitly making these objects immutable and thus discarding some container properties!). To make up for this deficiency, array scalars then pretend to be full array so that most users will not notice the difference.

There is probably some agreement that ideally array scalars would not exist, at least with respect to them pretending to be an array/container object. One question that has never been quite decided is whether true Python scalars should exist for all NumPy dtypes.

Personally, I feel that, yes, they should exist! There should be int64 scalars, that behave like a single element in a NumPy array typed as int64. It seems natural that if you have a container where you can assign a single value to a certain position, then you should be able to extract that element again as a scalar.

Scalars should *not* exist:

From an array-programming perspective, scalars do not need to exist. In that sense, arrays are not really containers, everything is an array in the sense that everything uses array-programming semantics. At that point "extracting an element" is not a useful concept anymore as such. The array is not a container, it is a language concept.

The above probably leads to the cleanest possible semantics for array programming. As soon as NumPy (an array) is involved, all objects obey array-programming semantics.

There are two problems with this when adding it to a language like Python:

As said above array content is mutable. This is actually specific to NumPy, and not a requirement (arrays could in theory be copy-on-write). Having no scalars thus means that the use as dictionary keys is clumsy (it might require a frozenarray).
NumPy builds on top of Python, and Python objects are not array objects. Assignment to the array is typically done by using a Python scalar which feels like a container, when it actually is a coercion to an array and only then an assignment. This means that extracting a Python scalar can only exist as a special method arr.item() and not by other means.

I my opinion the advantage of this view is the automatic conversion to arrays in many functions, since currently np.add(scalar, scalar) is the same as np.add(np.array(scalar), np.array(scalar)). Which is different from operators, where scalar + scalar lives in the scalar Python world. As such this is not a problem: Functions can choose to be array-programming only. The problem arises due to the fact that we currently return array scalars when the input is zero dimensional, so that users may rely on a scalar result for scalar inputs. It may also be more useful semantics: You do not silently lose the scalars immutability advantage. Note that an immutable array class could already achieve this, but for scalars we choose to not allow it.

I personally believe that scalars are a good idea. Users starting with Python are used to them, and all Python objects are scalars, so thinking of arrays as containers is a much easier concept! Further, with view semantics and mutable arrays, there is no good way to have immutable and hashable scalars for example to index dictionaries with.

There is often the notion that the current NumPy scalars (array scalars) duplicate behaviour and are a huge burden on maintenance. I believe that because of this, many would prefer to have no scalars at all. Personally, I think that is unfair towards scalars, in that they are blamed for the messiness of how they are implemented rather than for the idea itself.

The other issue is that the boundaries of when a scalar and when a 0-D array is to be expected seems blurry right now, and because of that it seems easier to just remove that blurry line completely. However, I believe that it is straight forward to clarify the delineation:

All normal ufuncs will return arrays for array inputs, they do not create "array scalars".
Functions may (ufuncs should) return scalars for scalar only input.
Indexing can be confusing because arr[0] can be an array or a scalar. This is a problem with indexing semantics only. arr[0] should be a scalar and error if it is not. The user should use arr[0, ...] to get the array (even if it is 0D). A side not to this is: NumPy arrays should not be seen as list of lists, that seems to be a problematic concept. In principle, I would prefer if Python arrays would not allow iteration even, it should be arr.flat or arr.iter(axis=0) to clarify the intention. (Even without scalars I think iteration is problematic, although less of a problem, you still run into more issues when code expects NumPy arrays to be typical Python objects.)

Especially reductions take an axis argument, and a reduction over all axes is a zero dimensional result. I believe the delineation is clear here even now: axis=None means scalar results, axis=range(arr.ndim) should mean a zero dimensional result. Most users already use axis=None, and those who end up doing axis=(0, 1) and get a 0-D result are likely writing generic N-dimensional code which should return an array.

In short, I cannot think of any operation on arrays where the intention of whether or not a scalar should be returned is not immediately obvious by using None to indicate a "reduction" to scalar or removing the list-of-lists concept, by clarifying indexing and iteration. (One could even say: The shape of NumPy scalars should be None if they must have a shape attribute.)

Of course changing the way NumPy arrays are iterated and indexed would be painful, and maybe not feasible. But so is the removal of scalars, so I believe the question is what point to look for on the horizon.

Note that for all practical purposes in the following discussion elements of NumPy arrays should be considered to be scalars of a type matching the datatype of the array. Even if the general notion of NumPy scalars existing may be contest, this seems like a useful concept for discussion.

The Array's Datatype

A HomogeneousElementwiseContainer` such as NumPy arrays, must know the type of each element. In NumPy, further the *storage details* of each element are necessary: The scalar may be ``int64, but NumPy also needs to know that this it is stored in little or big endian byte order. (Most scalars do not expose their innards, and do not have to worry about storage details, especially since they are typically immutable.)

The primary use of the Datatype as a concept is that of informing the user:

dtype = arr.dtype

Provides an object with all the information necessary to know what can be stored inside the array. The dtype object further provides some storage details, such as endianess or itemsize.

##### User Perspective

From the user perspective, immediately only some storage information is necessary:

Most users will need to know what type of scalars the elements are. I.e. what currently is stored as arr.dtype.type is sufficient for them. This provides all information to know how the elements behave. A user does not even need to know that an int32 is stored in 32 bits, the 32 bits are only important because it limits the range of math with this scalar.
Current strings are different, since they also store a length. The arr.dtype, thus also provides information on limits of what can be stored in the array. While there may also be limits on which operations are possible for the array, this information does not affect functional behaviour: if an operation is defined, it gives the same result as for the np.str_ type.

In that sense the elements of an array in general can be a strict subset of the scalar type. Any element behaves identically inside the array and as a scalar, but not all elements can be stored inside the array necessarily, and in some cases the array may not perform how to do a certain operation. Ideally, of course, both elements and scalars are functionally equivalent.

In many cases it would thus be sufficient if the array only exposed even the arr.dtype.type, and hid the arr.dtype storage details. However, Python provides no good way to describe "strings of length 4 or less" as a type. (I would believe this is inherent to the dynamic nature of the language?)

Storage Perspective

Additionally to the type information, the arr.dtype provides storage information such as itemsize and endianess associated with all element and necessary to access or assign to them. This storage information could in NumPy 1.18 just as well be stored on the array itself, since it always includes the exact same, fixed, set of values.

Container type and dtype

From a user perspective, the scalar type (as mentioned above) is the main information necessary. This is much like a C++ templated container, where the template parameter is the scalars type.

In NumPy (and Python generally), this construct does not exist. There is no class for np.ndarray<np.int64>, the scalar type is instead part of the ndarray instance. From a theoretical point of view, we can still see np.ndarray<np.int64> as a valid type, one that encompasses all possible array shapes, strides, … and also all element properties such as endianess and itemsize.

From a typing perspective (do not read class!), np.ndarray is clearly a super-type of np.ndarra<np.int64> The (current) dtype storage, however, do not have a defined home if we only consider np.ndarray<np.int64>. They could just as well be part of the np.ndarray class (i.e. fields on C-side struct). In NumPy (as of 1.18) that would actually work, although it would waste storage space.

However, this means that np.ndarray would be required to know all possible storage variations for all possible scalar types.

Thus, enter the DType type. We can describe np.ndarray as "templated" via np.ndarra<DType[np.int64]> where DType[int64] encompasses all desired storage variations of storing an np.int64. Thus the np.ndarray type itself does not need to be extended for new contained scalar types, it is only necessary to define DType[new_type].

DType instance/realization limiting the Scalar Type

DType[scalar_type] encompasses all possible scalars and can store them faithfully as elements when a new array is created. However, a specific array with a specific DType[scalar_type]() instance has limited storage capabilities. Since the array is homogeneous, it thus may naturally limit the scalars that can be stored inside it.

This is the string case above, the values which can be stored may be a subset of the scalars values. Theoretical arr.dtype.type of a bytes array is not np.str_, but np.str_[limited to a length of N].

Enter Quantities (Additional Homogeneous Information)

Quantities extend the array object with additional homogeneous unit information, such as a NumPy array of "meters":

   Quantity = (np.ndarray<DType[numpy_defined]>, Unit)

And can thus be implemented as a subclass of ``np.ndarray`` (and is currently
e.g. in ``astropy.units.Quantity``.

An alternative spelling would be::

   Quantity = np.ndarray<(DType[numpy_defined], Unit)>

Note that neither of the tuples (np.ndarray, Unit) or (DType[.], Unit) are necessarily easy to define generally (the second one is currently impossible). Also note that Astropy Quantities do not have a scalar associated with them right now.

The above tuples show some of the tradeoffs of the approach.

Implementation Differences:

(np.ndarray, Unit) can be defined as an array subclass (as that is currently possible)
(DType[numpy_defined], Unit) encompasses the set of types: DType defined already by NumPy. It thus either needs a class/wrapper factory or include the DType[numpy_defined] dynamically.

Technically, both methods should be similar in complexity, assuming "inheritance" is possible, although 1. is possible a bit more straight forward, because a single subclass encompasses all existing DTypes. This advantage is probably diminished a little if you have scalars.

For both cases, the difficult part is designing a good wrapping of the existing computational kernels which, in NumPy, are defined by the universal functions.

Dependent ndarray type

It should be clarified that both of the above approaches define the identical type (space of all possible instances). The difference is that from a Python implementation perspective, the dynamic creation of a subclass is typically not done. Thus, there is no actual class object for np.ndarray<DType[np.float64]> and thus type(np.ndarray<DType[np.float64]>) is np.ndarray.

The user is aware of this, and OK with it. The main result of it is who and how methods on the np.ndarray<DType[np.float64]> are defined. Pandas for example induces/choose a subclass based on the DType, i.e. it may define a class for np.ndarray<DType[np.float64]> during construction.

I.e. this leads to the convenience of DType specific methods on the container. From a purist point of view, the container should maybe not have sum().

Quantity (DType) induced Methods:

The main advantage of the array subclass solution is that it is straight forward to define Quantity.to(new_unit) or Quantity.unit which would otherwise have to be accessed through the dtype attribute and may require explicit passing of the array.

To some degree, NumPy could of course add support for array methods to be induced by the existence of a dtype (arguably, .imag is in a sense). An alternative to "induce" methods could be arr.elem returning a bound dtype provided object, which can add methods so that arr.elem.method works.

It is always possible to create a subclass which only adds the new methods to the ndarray. But we would have to carefully weigh if we wish to allow dtypes to provide e.g. MixIns to induce additional methods. On the up-side a careful designed MixIn could work with arbitrary array-like objects that implement only a minimal set of what NumPy does.

The purist point of view may be this, if methods are desired:

The base np.ndarray should have almost no methods (e.g. no .sum)
For each DType there should be a subclass of np.ndarray automatically created. The DType can then MixIn DType related methods, which do not need to know about the arrays shape, etc..
The problem of methods that work on all values, and must either:
- The DType could be allowed to MixIn array methods (such as sum, conjuage), assuming that they are backed by ufuncs (which are array multi-methods).
- An array library chooses to not have such methods. I.e. you always must use a functional approach. Such as unit.to(ndarray, new_unit) instead of Quantity.to(new_unit).

Or of course, we see

Generalization to Other Array-Likes (HomogeneousElementwiseContainer)

A subclass of np.ndarray can only hope to generalize to other array-likes if these array-likes are "templated". For example, it could largely work for Dask, which is a distributed array: daskarray<np.ndarray> and coul accept an np.ndarray` subclass as template parameter in principle.

Other array-likes, such as pandas dataframes, or sparse arrays, however, have no chance of reusing a subclass.

On the other hand, the DType approach generalizes to other array-like immediately. The only concern being whether computational kernels provided are fast for the specific array-like container.

Array-programming methods

The methods defined on NumPy arrays and ufuncs, and which we discuss above are methods defined on the type HomogeneousElementwiseContainer<DType>, since we have the distinction of container and DType, i.e. the instance can also be written as the (np.ndarray, dtype) tuple, methods need to work with the full type (including the array information) in general.

The simple solution to this is elementwise computation: Every container methods uses the elementwise computations defined by the scalar type (the DType only describes the storage of the scalar type). In practice this may work in C++ templates in some cases, but in NumPy this is not feasible since working only with single elements incurs a too high overhead performance wise. Even in compiled languages it is not advisable in general, since e.g. SIMD operations may be a large performance gain if they know that they are working with more than a single element.

This leads to the implementation of Computational Kernels, which are wrapped inside the UFuncs.

Computational Kernels (UFuncs)

The split into computational kernels is difficult, because it necessarily blurs container and scalar functions to some degree. Unlike a typical scalar function, the computational kernel used inside the ufunc (the ufunc loop), can operate on many, strided, elements. The array object, thus can call this kernel with (read ufunc loop implementation) for many elements at once speeding up the computation, but still hiding the full details of the arrays memory layout.

If a computational kernel, such as currently in NumPy, can be called multiple times, it may need additional hooks before and after the full computation. This is to prepare the kernel (e.g. allocate computational memory), do error checking, or simply perform type related work which only need to be performed once.

Thus, the line between container and element is blurred: The kernel provided by the DType to be used in an array-method must have partial knowledge about the array objects memory layout for fast computation. It does not need to know the memory layout, but it must anticipate it by providing a sufficiently optimized kernel function.

It may be helpful to remember the two extreme cases:

The "kernel" works on a single element, so that it knows nothing about the array object itself. This is too slow in practice, but clearly correct and simple.
The "kernel" knows the full layout and shape of the array. It thus can perform the full workload (e.g. datashape of xnd does this) in one method call. In this organization, the array and dtype are a single unit of abstraction. For all practical purposes (ndarray, dtype) is one object and the dtype is merely an implementation detail of where methods are stored. While it may work, generally it makes no sense to use a dtype without also using np.ndarray in this context!

Currently, for many people NumPy probably falls into category 2., while internally it does use computational kernels on strided elements using DTypes, neither the DTypes nor the "kernels" (ufunc loops) are extensible. The organization into "inner-loops" is an implementation detail: The actual "kernel" is currently the full ufunc machinery minus the dispatching step! In NumPy, if you disregard dispatching by fixing the signature in a ufunc call, the kernel works exactly on ndarrays. In other words, while we may have a kernel, the input to this kernel is actually a full ndarray.

You can thus even argue: a NumPy array is right now both the user facing object and the building block for computational kernels. In that regard, a Quantity subclass does not have a datatype at all, it has a NumPy array as data description (datashape) object.

Best of both worlds?:

My view of the above is that NumPy should embrace embrace the kernel design more clearly and make it an integral and exposed API choice. As of now, it is largely an implementation detail: We write ufuncs, by providing computational kernels (i.e. strided loops), but then wrap them up into the bigger computational kernel that is the full ufunc (with a specified DType). From a Python perspective this makes sense, since the kernel itself is useless unless used in conjuncture with an array object. Maybe as a help, the kernel design exposed to Python could look like this:

float64_add_kernel = np.add.get_implementation([np.float64, np.float64])
np.apply_kernel(float64_add_kernel, (arr1, arr2))

which could be wrapped a bit nicer for the user. The main point then being that float64_add_kernel provides low-level callbacks to operate on many elements at the same time for a given (d)type in an efficient manner. A different array object, could choose to use the same lowlevel functions, without having the need to wrap it into a numpy array first (which can be slow, if the memory layout does not match the NumPy one). I.e. the "kernel", in my opinion, should not be tied rely on a python object, but only a specific signature, such as the strided one dimensional loop. This also removes any direct chance of using the Pytohn object without the GIL.

We could also think of it as a collection of low-level callbacks:

int.__add__
int.__add__.contiguous_loop
int.__add__.strided_loop

etc. Except that we need the to do the perparation and dispatching step before we could call an int.__add__.strided_loop. The choice of doing this, instead of the full array operation (although of course there could be a kernel that does the full array one), is mainly due to the current organization into one dimensional loops being useful. However, there is one further huge advanage of this: it allows to think of buffering in a much easier way.

Buffering is the main reason for the low level kernel design!

In the above discussion I forgot one important point, NumPy does casting, and casting needs to be buffered. This means that an actual operation is actually a composition of multiple steps, potentially casting both inputs and outputs and the calculation step itself. Casting has to be buffered (or at the very least done in chunks) to be cache and memory friendly. Casting logic requires a fairly large machinery (at least it does so in NumPy), and it also requires functionality (kernels) very much like the mathematical kernels. Since the actual calculation has thus in general many steps, the full array cannot be the basic element of kernel execution.

Conclusion

I believe due to buffering being, the only reasonable organization is into low level C-style kernels which are not tied to any specific python object or data structure (of course the kernel design itself could be targeted to NumPy arrays). But even that could be expanded: There is no special reason why NumPy should not provide kernels, or allow to add kernels, which it does not require itself. NumPy already effectively embraces that design in implementation, and probably people smarter than me decided to do it this way exactly for these reasons. A further reason is that casting and operations are chained, but also multiple operations could be chained in principle.

The main point of this for me, is to clarify that the array should not be seen as the input to or host of the computational kernel. From my perspective the computational kernel is tied solely to the datatype while the array object uses those kernel to compose the full operation efficiently: the datatypes are the orchestra, the array is only the conductor (although in reality since the kernels are limited, that is a lot of work to do – which is done in the iterator machinery).

seberg/array-types.rst