Investigating the integer types in the Chalice programming language

Low level languages like C and Chalice need to have a selection of different integer types. In languages like Python can mostly get away with just one integer type, but when you're working at a very low level, you really need to be able to be specific about what your integers mean and how much space they take up.

One little-known fact of C is that code can speed up significantly if signed types are used instead of unsigned types. The reason for this is that signed types have undefined behaviour on overflow. This seems like a terrible design choice at first, and in some ways it is. But it means that for example if 32-bit signed integers are used to compute offsets into an array, the compiler is allowed assume that they never overflow and use pointer-sized overflow semantics instead of 32-bit semantics, which is different on a 64-bit platform. This might seem like an abysmal thing to do - it changes the semantics of the programme! But in all likelihood, a 'proper' overflow of that value would also be a bug, and in fact might be a worse bug.

Sometimes we want wrapping behaviour and sometimes we want to give the compiler leeway to wrap however it likes. What we don't want is to tie this to whether the type is signed or unsigned. That's actually a totally orthogonal aspect of the type.

As such, Chalice has both wrapping and non-wrapping integer types. By making this distinction, Chalice allows the user to decide when they want to have wrapping behaviour and when they want the compiler to trap on overflow. This means that it's actually safer to have this distinction. Often you don't want even your unsigned types to overflow at all, but because that's how unsigned types always work in C and C++, the compiler can't know that it was unintentional. In Chalice, overflow is by default a trap at runtime in debug mode and implementation-defined (not necessarily consistent across platforms nor even across instances of overflow) in release mode i.e. with optimisations turned on.

Having no specific size requirements for integer types in C was probably a mistake. It can be argued that leaving these sizes unspecified allowed C as a language to remain relevant on new platforms, as int is just the standard sized integer type on the platform. Today, the rate at which architecture bit sizes are changing seems to have slowed down to the extent that we might not need 128-bit machines ever. As such, it seems sensible to just pick some sizes and stick with them. So Chalice has fixed and well-specified sizes for its integer types.

Chalice's arithmetic integer types are:

uint8, uint16, uint32, uint64: the standard unsigned integer types. These have implementation-defined overflow behaviour, but should cause a runtime error if they overflow without optimisations turned on.
int8, int16, int32, int64: the standard signed integer types. The same overflow rules as uintX apply to these types.
uwrap8, uwrap16, uwrap32, uwrap64: the wrapping unsigned integer types. These are guaranteed to wrap as you would expect unsigned integers to wrap.
wrap8, wrap16, wrap32, wrap64: the wrapping signed integer types. As with uwrapX, these are guaranteed to wrap according to the rules of two's complement arithmetic.

However, that's not the full story. There are also a selection of other integer types in Chalice, but they are not arithmetic types. That means that they aren't intended to be used for arithmetic. These types are intended for use with pointers, offsets, object sizes, etc. They are:

size: this is a distinct alias for uintX where X is the length of a pointer in bits. So on amd64 this is a distinct alias for uint64 while on x86 it is a distinct alias for uint32.
offset: this is a distinct alias for intX where X is the length of a pointer in bits.
err: this is a distinct alias for intX where X is the natural word size of the platform. In most cases this is the same as the length of pointer. It is important that this is conceptually different from an integer.

milesrout/int.md