An array is a structure mapping indices to data values, where the indices have a particular structure.
An array can be thought of as a pair of an index set and a data set.
Two arrays are equal if they map the same indices to the same values.
An array is iterable, and when iterated generates exactly the values of
the data set.
Therefore arrays support "lifting", i.e. given any value x
, you can
construct an array that generates exactly x
, once.
Put differently, arrays support map
.
This is a crucial characteristic, perhaps the primary feature that makes
arrays so powerful and intuitive.
If you understand "x", you can immediately understand "array of x".
The index set I
of an array A
is also iterable, and generates values i_n
such that A[i_n]
gives the data values of A
in the same order as iterating
A
itself. The generated index objects must be unique.
I
has an associated total order over its elements, order(I)
.
Given two elements of I
, i_a
and i_b
, order(I)(i_a,i_b)
is
true iff I
generates i_a
before i_b
.
I
is constructed from N
"dimensions", and each index i
generated by
I
has N
components.
Loosely speaking, a "dimension" is a vector-like object describing the
possible values of a component of an index.
(Given the syntax A[i_1,...,i_n]
, we imagine that the i
s are immediately
combined into a single index object in a manner dictated by A
.)
I
contains a mapping from dimension labels to dimensions.
To the extent the syntax A[i_1,...,i_n]
is supported by A
, 1:n
are
accepted as dimension labels, though other labels for the dimensions
might also exist.
In summary, an index set consists of:
- an integer N
- a mapping from N labels to N dimension objects
- a way of generating index objects from the dimension objects, such that each index has N components, and each component is described by its corresponding dimension object in some way.
- a total order on the index objects
Example for Core.Array:
- integer N: type parameter
- labels are integers 1:N, each dimension is a
OneTo(m)
- index set is the cartesian product of the dimensions
- order is colexicographic order on tuples
The index set object is implicit in Array
and not a separate
object. That's of course fine as long as the index set can be
obtained on demand.
Example for NDSparse:
- integer N:
length(a.index.columns)
, orndims(a)
- labels are
keys(a.index.columns)
- index set is the ordered zip of the dimensions
- order is lexicographic order on tuples
It is common to talk about the storage order of an array, but this is a bit confusing. In this model, the data is just there, and the ordering is controlled by the index object.
It is recommended that every aspect of an array except the contents of the data and dimensions be represented in the type system:
- the types of the dimension objects (implying N)
- the labels (keys) for the dimension objects
- the index order
- the type of the data representation
Example of what an array type and constructor might look like with all of these features exposed:
Array{T,N} <: AbstractArray{ColumnMajorProduct{NTuple{N,OneTo{Int}}}, Buffer{T}}
A = array(ColumnMajorProduct(x=-1:.1:1, y=-1:.1:1), repeated(0))
A isa AbstractArray{ColumnMajorProduct{NamedTuple{(:x,:y),NTuple{2,FloatRange{Float64}}}}, Repeated{Int}}
In theory, there could be just a single concrete array type parameterized
this way, with methods currently specialized for Array
instead specialized
for e.g. UrArray{ColumnMajorProduct{NTuple{N,OneTo{Int}}}}
.
indices
currently returns a tuple of dimensions, and the fact that the
actual index space is a product of these is implicit.
I think indices
should return an index iterator that exposes (1) the same
information as indices
currently does, but also (2) the order.
So it would return e.g. ColumnMajor{Tuple{OneTo{Int},OneTo{Int}}}
.
The current behavior of indices
could be called something like dimensions
,
idea being you just get the container of dimensions.
For plain arrays you'd get a tuple, and for arrays with named dimensions
you'd get a NamedTuple or dictionary.
We should have a function order(::(array or index set))
that returns the
total order predicate for indices.
We could define e.g. ColumnMajor:
abstract Order <: Function
immutable ColumnMajorOrder <: Order end
(::ColumnMajorOrder)(x::Tuple, y::Tuple) = isless(reverse(x), reverse(y))
Currently this is handled by having CartesianIndex
implement isless
.
This is valid; an index set can have its own element type that implements
isless
in the right way, and then order(I)
can just return isless
.
However it seems worthwhile to be able to reuse index types, e.g. Tuples,
for multiple kinds of arrays and let them instead specify a different
order predicate.
We should perhaps try to use tuples instead of CartesianIndex
and see if
the order(I)
interface works well enough.
This model also implies that dimension names or labels are not intrinsic to dimensions themselves, but are keys used to look them up. This is because the keys used to look up dimensions must be unique, and uniqueness can only be enforced by the container.
A SparseMatrixCSC has two important index sets: one for the matrix interface it implements,
and one for just the nonzeros.
For example you could efficiently copy just the nonzeros from a sparse array to another
array using dest[nonzero_indices(sp)] = nonzeros(sp)
.
nonzero_indices(sp)
would return an object with similar characteristics to
indices(::NDSparse)
(with a different implementation of course).
indices(A)
- get index setdimensions(A | indices(A))
- dimension containerkeys(dimensions(A))
- dimension labelsorder(indices(A))
- index order predicatendims(A) = length(dimensions(A))
ArrayType(IndexSet, Data)
- constructorstorage(A)
- get the data part ofA
, which implements some TBD storage API that might be slightly different from the array API. This function can returnA
itself ifA
implements that API and there is nothing else sensible to return.
- Lifting semantics: an array wraps data values unrelated to its indices. A dict is a set of key=>value pairs, not just values.
- Dict keys only need to support equality, not a total order. While ordered dicts exist, the ordering part is clearly optional and not central to what it means to be a dict.
- A dict can be constructed from a sequence of key=>value pairs. An array can't really do this, since you need the structure around the indices and not just the indices themselves.
Might want to email matt directly, iirc gist pings don't do anything.