Skip to content

Instantly share code, notes, and snippets.

@seberg
Last active February 15, 2019 19:29
Show Gist options
  • Save seberg/925ee5ecb312a467ce822c90adbf5659 to your computer and use it in GitHub Desktop.
Save seberg/925ee5ecb312a467ce822c90adbf5659 to your computer and use it in GitHub Desktop.
Numpy Casting Rules

Numpy Promotion Table

This table includes the promotion rules for the basic numeric types. The only suprising rule is that 8-byte integers are considered to cast "safely" to f8 (and complex), which also shows up in these promotion rules (If there was an int128, it would be allowed to safely cast to float128!):

i1 u1 i2 u2 i4 u4 i8 u8 f2 f4 f8 f16 c8 c16 c32
i1 i1 i2 i2 i4 i4 i8 i8 f8 f2 f4 f8 f16 c8 c16 c32
u1 i2 u1 i2 u2 i4 u4 i8 u8 f2 f4 f8 f16 c8 c16 c32
i2 i2 i2 i2 i4 i4 i8 i8 f8 f4 f4 f8 f16 c8 c16 c32
u2 i4 u2 i4 u2 i4 u4 i8 u8 f4 f4 f8 f16 c8 c16 c32
i4 i4 i4 i4 i4 i4 i8 i8 f8 f8 f8 f8 f16 c16 c16 c32
u4 i8 u4 i8 u4 i8 u4 i8 u8 f8 f8 f8 f16 c16 c16 c32
i8 i8 i8 i8 i8 i8 i8 i8 f8 f8 f8 f8 f16 c16 c16 c32
u8 f8 u8 f8 u8 f8 u8 f8 u8 f8 f8 f8 f16 c16 c16 c32
f2 f2 f2 f4 f4 f8 f8 f8 f8 f2 f4 f8 f16 c8 c16 c32
f4 f4 f4 f4 f4 f8 f8 f8 f8 f4 f4 f8 f16 c8 c16 c32
f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f16 c16 c16 c32
f16 f16 f16 f16 f16 f16 f16 f16 f16 f16 f16 f16 f16 c32 c32 c32
c8 c8 c8 c8 c8 c16 c16 c16 c16 c8 c8 c16 c32 c8 c16 c32
c16 c16 c16 c16 c16 c16 c16 c16 c16 c16 c16 c16 c32 c16 c16 c32
c32 c32 c32 c32 c32 c32 c32 c32 c32 c32 c32 c32 c32 c32 c32 c32

Additional promotion rules:

  • Strings will promote to the longer string. Byte strings promote to unicode strings.
  • Strings mixed with other types create large enough strings to hold a string representation.

The above table can be created with np.result_type, the scalar cases require passing in a 0D array instead of the dtype.

Scalar Behaviour

Scalars (or 0-D arrays for that matter), are "weak" when promotion occurs and this is is value based (if all operands are scalars, these rules are not applied).

  • Within their group (floating, complex, signed ints, unsigned ints) the scalar will be represented as the smallest dtype capable to hold their value. For floating point this means a finite result in the new type.
    • However: the actual ranges for floating point add a bit wiggle room, switching earlier than necessary to avoid inf. For example float16 switches at 65000 instead of 65504.
    • Since they are cast within their group, the values at which upcasts occur are not identical between integers and floating point np.array([1], np.float16) + 650 will be a float32 (because 650 is at least int16 and float16+int16→float32, but 650. will be at least float16.
    • Bug: Non-finite values are always assigned the smallest floating point type, but this does not occur for Non-finite (both real and imaginary) complex types.
  • Unsigned integers are allowed to go to signed one, if it fits into the range.

Datetimes:

The truth may be a bit more complex, but I expect many of the possible logic combinations cannot be hit in practice:

  • Casting to more precision is "safe". However, only if the units are multiples of each other (e.g. years to months, but not to weeks; but years to months is not allowed for timedelta). (see also datetime_metadata_divides)
  • Same kind casting is always allowed (unless one is generic)
  • All integers can "safely" cast to timedelta → This seems strange, and mabye would be better solved by simply adding the corresponding loops to the ufuncs (such as np.add allowing ml->m)? This is what can cause problems with additional timedelta loops being picked up for integers.

Usertypes:

There are probably a few quirks here, e.g. always allowing casts from boolean (which is not allowed for datetimes). Otherwise usertypes register casting functions (in either direction) and specify which casts are to be considered safe.

They can (rational and quaternions do not) additional specify an exact promotion order for scalars within the kinds of bool, unsigned integers, signed integer, floating, complex, and object. This would allow them to do non-value based downcasts for scalars with the downcast not crossing kind boundaries. (To me this seems unnecessarily complex and an artifact of trying to have something allowing type promotion logic – exclusively for scalars – but not actually providing a mechanism to define type promotions.)

Same kind casting

In some cases, numpy allows same kind casting. Same kind casting is grouped into the following ordered groups:

  1. boolean
  2. unsigned (integer)
  3. signed (integer)
  4. floating (inexact)
  5. complex (inexact)
  6. bytes/string
  7. unicode
  8. Void
  9. Object

Casting to a higher group is generally allowed (although void will be more difficult). I am surprised that this allows integer to inexact casts for same_kind!

Ufuncs

Ufuncs will go up the list of registered loops trying to find the first one that fits when allowing "same_kind" casting for the inputs

  • This allows np.floor(np.arange(10), dtype="float32") to work, even though only inexact loops are defined for floor and the cast is not safe.

There seems to be a bug here with datetimes and forced dtype (which I think usually forces all types?)?:

np.true_divide(dt, 2, dtype=np.dtype("timedelta64[m]"))
np.true_divide(dt, 2, dtype=np.dtype("D"), casting="safe")

The dtype seems fully ignored (you can even set it to string). Or maybe this is the true divide resolver specifically. Giving timedelta for a typical loop, will force a timedelta output loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment