As C programmers, most of us think of pointer arithmetic for multi-dimensional arrays in a nested way:
The address for a 1-dimensional array is base + x
.
The address for a 2-dimensional array is base + x + y*x_size
for row-major layout and base + y + x*y_size
for column-major layout.
The address for a 3-dimensional array is base + x + (y + z*y_size)*x_size
for row-column-major layout.
And so on.
A simple but powerful change in perspective is to stop thinking of the dimensions as nested and ordered relative to each another and to start thinking of them more symmetrically on an even footing.
The only change required is to associate a stride with each dimension and compute offsets as a sum of products:
The address for a 1-dimensional array is now base + x*x_stride
.
The address for a 2-dimensional array is now base + x*x_stride + y*y_stride
.
The address for a 3-dimensional array is now base + x*x_stride + y*y_stride + z*z_stride
.
The change for 1-dimensional arrays is already helpful since it lets you create a strided view of an existing array. For
example, you can skip every other element, so only even-indexed elements are visible, using x_stride *= 2
and x_size /= 2
.
If you first do base += x_stride
, you also get a complementary strided view of the odd-indexed elements. The strides, sizes
and bases for the new views are computed in terms of the strides, sizes and bases for the old views, so these transformations
can be applied recursively. If you twice apply the recipe for getting every 2nd element, you get a view of every 4th element.
Actually, we left out a degenerate case which turns out to be very useful. A 0-dimensional array is just a scalar lvalue.
Why is including this seemingly trivial case worthwhile? Well, we can create higher-dimensional views from data that
was originally intended for a lower-dimensional array. And one of the most important examples of that is broadcasting scalars.
To create a vector view of length n
for a scalar, use the same base address, an x_size
of n
, and an x_stride
of 0
. This works
for broadcasting scalars to matrices and higher-dimensional arrays as well: just add new dimensions with the desired sizes and set all
the new strides to 0
.
The other important reason that scalar views matter is that they act like plain old pointers to individual scalars. That means you can point to an entry of a higher-dimensional array and then write to it through your scalar view in addition to seeing changes to the underlying data over time, if it changes.
This basic idea also works for broadcasting vectors to matrices. You can choose whether to treat the vector as a row vector
or column vector. In the former case, you set the y_stride
to 0
and the y_size
to the desired height of the matrix. If you
want to treat the vector as a column vector, you do the same, but swap the x and y strides and sizes.
The row vector vs column vector example shows another great strength of this approach: you can transpose dimensions by just permuting the strides and sizes in the descriptor, without any underlying data movement. The performance cost of working with a transposed view is going to be more indirect based on how it affects memory access order. However, if you ever want to copy the data to conform to a given view, you can just call your generic array copy function, which lays out the copied data according to a default or specified convention, such as row-major. And this observation holds true for all views, not just transpositions. For example, if you are taking very long strides in a one-dimensional array and as such making ineffective use of the cache, you can make a copy through that view, and now you have a tightly packed version of the data that still conforms to the same array shape.
As a final note on broadcasting, it's sometimes useful to think of it as two separate steps. The first is promoting a lower-dimensional array to a higher dimension without replicating the data. Thus a scalar becomes a length 1 vector and a length n vector becomes an nx1 or 1xn matrix. Broadcasting then extends a size 1 dimension up to a specified size. The inverse operation of promotion is demotion, where you collapse any dimension of size 1.
Another stride trick that you can see as an extension of the broadcasting idea is to use strides that generate overlapping
but non-identical (unlike broadcasting) views of the elements. As an example, suppose you want to do signal processing or
data analysis of a vector, which requires you to view the data with a sliding window of length n
. You can create a matrix
view where y_stride = x_stride
and y_size = x_size - n + 1
. That is, assuming you want the step size for the overlapping
windows to be 1. You can support a window step size of k using y_stride = k*x_stride
. You can combine sliding and slicing to produce a diagonal, subdiagonal, superdiagonal, etc, vector view of a matrix.
You can also use negative strides, if you update the base accordingly. The primitive operation for that is reversing a dimension by negating its stride. You first have to offset the base to the last element along that dimension. As an exercise, show that using a combination of reversals and transpositions you can generate all 90-degree rotations of an image.
Creating reshaped views where the underlying data is presented in the same way, without duplication or overlap or omission, can also be very useful. For example, you can reshape a vector into a matrix or a matrix into a vector. You can take any higher-dimensional array and reshape it into a vector to see the underlying data in its memory layout order. The total number of elements across will be the same when you reshape, but the factorization into different dimension sizes will differ. A surprising example, courtesy of Paul Khoung, is that if you reshape a 1-dimensional array of length 2^n into an n-dimensional array of length 2 on each dimension, transposing the dimensions corresponds to bit-permuted indexing. In particular, by reversing the order of the dimensions in the view descriptor, you can do bit-reversed indexing, used for the in-place FFT. The strides are the different powers of 2 and permuting the strides means that you're permuting which of the index components they get multiplied by. Of course, this isn't a very efficient way to do the indexing compared to a specialized bit-reversal routine or CPU instruction, but it provides a non-obvious example of how reshaping can alter your view of the data in deep ways.
When you get used to the power of this way of working with data, you'll want to extend it as far as possible. First, sometimes your "scalars" actually have internal structure, and it's fruitful to treat the whole thing as a unified flat array of higher dimension. If you have a time series of (x, y) coordinates, treat it as a 2-dimensional array. If you have an RGB image, treat it as a 3-dimensional array. If you have a video of RGB images, treat it as a 4-dimensional array. By doing so, you can access the data in new, powerful ways. One of the simplest examples is transposing the dimensions of the view without copying any data. So you can take an image, viewed as a 3-dimensional array, and create a 2-dimensional view corresponding to just the R component.
When your "scalars" are non-homogeneous, structured data, such as a struct consisting of a name and an age, you can't treat it as a single flat array of higher dimension, but you can still project to the different struct fields by offsetting the base address of the array. So you can have a 2-dimensional array of people and project it to a 2-dimensional array of just the names or just the ages. There's an extension of this general framework to "struct views" where a descriptor is a list of struct field names and corresponding offsets, but I won't cover that here. The big difference is that an "index" is now a field name rather than an integer. "Slicing" corresponds to projecting down to a subset of field names instead of an index range.
You can "transpose" struct dimensions with array dimensions, which yields the well-known transformation between struct-of-arrays (SoA) and array-of-structs (AoS). The difference is that this involves a little copying beyond just diddling with the descriptors. For the AoS to SoA transformation, you need to create a new struct containing the array views with the field offsets applied; you don't need to copy the actual arrays. Ragged arrays fit into this framework as arrays of array views in a similar way to structs of arrays. But unlike the SoA transformation case, if you want to transpose an array of arrays, you're now looking at copying a variable-sized, probably large, array rather than a small, fixed-size struct, so transposition is less useful there as well.
Multi-dimensional array views also provide a nice way of thinking about padding and alignment and logical vs physical data layout. You can start with a physical array where the sizes are aligned to the required physical boundaries and then slice it down to its logical dimensions. You see this kind of thing in graphics programming all the time for images and other GPU resources. In fact, the low-level notion of a "GPU resource descriptor" you see talked about in modern graphics APIs is a close kin of what we've been talking about, albeit in a specialized context.
A fun fact is that you can support unbounded dimensions with this approach very easily. With the traditional view in C of
multi-dimensional arrays as arrays of arrays, only a single dimension can be unbounded, which is the outermost dimension in terms of memory ordering. For example, with a row-major matrix, you can support an unbounded number of rows, but the number of columns must be bounded and known for the address calculation base + x + y*x_size
. Not only do multi-dimensional array views
support unbounded dimensions, but the interpretation of which dimension is unbounded is flexible, and if you use broadcasting
and similar tricks you can even have multiple unbounded dimensions.
For example, you can broadcast a scalar to an unbounded vector or matrix, or broadcast an unbounded vector (which could be backed by a physically unbounded array) to a matrix that is unbounded in either x or y, and you could broadcast this matrix to a 3-dimensional array which is unbounded in the z direction via broadcasting. An interesting and potentially useful example is applying the sliding window transformation to a physically unbounded array, which generates a matrix view that is unbounded in both directions. Unbounded arrays are memory safe if they're generated via broadcasting or sliding windows, but if they're generated from physical memory as in the traditional C case, it's obviously your responsibility as the programmer to stay within the physical bounds, even if the array view doesn't need to know or care about the physical bounds (which could grow or shrink dynamically over time without the array view shape needing to change).
As great as multi-dimensional array views are for working with in-memory data, they have special synergy with memory-mapped data structures. The one caveat, as always with memory-mapped data, is that the premium paid for an incoherent, unpredictable access order is dramatically higher than when you're just missing the CPU cache and TLB. Unbounded array views are also a natural fit any time you have streaming data. Even without transparent OS-managed memory paging, you can pre-reserve virtual memory up to a conservative bound and gradually commit and fill in the underlying data as IO operations complete. The array view descriptor can stay invariant as data is streamed in because the unbounded dimension is not required for address calculations.
This was a super interesting post, I really enjoyed reading it. The 5th paragraph with the insight...
"The address for a 2-dimensional array is now base + xx_stride + yy_stride"
Really leapt out at me... and some of the other techniques are quite ingenious.
I have to admit I'd selfishly love to see some diagrams and drawings to go along with some of the text to make things a little clearer but I know that's more work ;)
I wanted to clarify your definition of 'broadcasting' as I'm not sure I've heard it used in this context before.
Thanks very much for sharing!
Cheers :)