I have an API for reading multi-dimensional variables that has a read method. For simplicity, I define these in a pseudo-Scala language here, but the question is really about the algorithm not the particular language. So:
trait Variable[A] {
def shape: List[Int]
def read(sections: List[Range]): List[A]
}
(in the concrete case the variable is ucar.nc2.Variable
, the range is ucar.ma2.Section
, and the data is ucar.ma2.Array
in UCAR's NetCDF-Java library). I have an input spec for sections, e.g.
val in: List[Range] = ???
The range here can be thought of as Scala's Range
type, with a start index (inclusive), stop index (exclusive), and a step size (possibly 1), i.e. start until stop by step
.
If I were to feed this into a single read
call, I would obtain a Data
object which has a linear iterator giving
val size: Long = in.map(_.size).product
elements. Now I have an instance that wants to read all these elements in chunks of size chunkSize
. So I want to define an API
trait ChunkReader[A] {
private var pos: Long = 0 // advanced by readNext
private def v: Variable[A]
private def in: List[Range]
def size: Long = in.map(_.size).product
def readNext(chunkSize: Int): List[A] = ???
}
Given the current state (pos
) and a call to ChunkReader#readNext
with a given chunkSize
, how can I implement this method using a minimal number of calls to Variable#read
and a minimum "overhead" of reading unused data that I would have to skip when generating resulting list?
A concrete example: A three-dimensional variable, an an example section being
List(
2 until 4, // size 2
3 until 6, // size 3
4 until 8 // size 4
)
So I want to piecewise read all the 2 * 3 * 4 = 24 elements specified by the section. Say I want to call ChunkReader#readNext
four times with chunkSize = 5
and finally with chunkSize = 4
. How would I calculate the corresponding sections to be handed to Variable#read
?
For example, I can manually think of the first call as giving
val first_four = v.read(List(
2 until 3,
3 until 4,,
4 until 8
))
val fifth = v.read(List(
2 until 3,
4 until 5,
4 until 5
))
return first_four ++ fifth
My intuition says, I can write this generally, always using between one and three calls to Variable#read
?
So here is a dummy implementation of Variable
to play with (I'm using Vec = scala.collection.immutable.Vector
here instead of List
):
type Vec[+A] = collection.immutable.Vector[A]
val Vec = collection.immutable.Vector
class Variable[A](data: Vec[A], shape: Vec[Int]) {
require(data.size == shape.product)
def read(sections: Vec[Range]): Vec[A] = {
require(sections.size == shape.size)
require(sections.zipWithIndex.forall { case (r, ri) =>
r.forall(i => i >= 0 && i < shape(ri)) })
val sz = if (sections.isEmpty) 0 else (1 /: sections)(_ * _.size)
val zip = (shape zip sections).reverse
Vec.tabulate(sz) { i =>
val (j, _, _) = ((0, 1, 1) /: zip) { case ((res, m, n), (dim, r)) =>
val add = r((i / n) % r.size) * m
(res + add, m * dim, n * r.size)
}
data(j)
}
}
}
Here is the example with a section of shape [2][3][4] and the variable for example having shape [8][8][8] and data being integers from 1 to (888 = 512):
val v = new Variable[Int](Vec(1 to 512: _*), Vec(8, 8, 8))
assert(v.read(Vec(0 until 0, 0 until 0, 0 until 0)) == Vec())
assert(v.read(Vec(0 to 0, 0 to 0, 0 to 2)) == Vec(1, 2, 3))
assert(v.read(Vec(1 to 1, 1 to 1, 0 to 2)) == Vec(73, 74, 75))
val selSome = Vec(
2 until 4, // size 2
3 until 6, // size 3
4 until 8 // size 4
)
assert(v.read(selSome) == Vec(
157, 158, 159, 160,
165, 166, 167, 168,
173, 174, 175, 176,
221, 222, 223, 224,
229, 230, 231, 232,
237, 238, 239, 240
))
Here is the task:
trait ChunkReader[A] {
def read(chunkSize: Int): Vec[A]
}
def test(c: ChunkReader[Int]): Unit =
assert((1 to 4).map(c.read(5)) ++ c.read(4) == Vec(
157, 158, 159, 160,
165, 166, 167, 168,
173, 174, 175, 176,
221, 222, 223, 224,
229, 230, 231, 232,
237, 238, 239, 240
))
// test(???)
Here is an auxiliary function that I think will help solve the riddle:
def calcInSection(pos: Int, sections: Vec[Range]): Vec[Int] = {
val sizes = sections.map(_.size)
val modsDivs = sizes zip sizes.scanRight(1)(_ * _).tail
modsDivs.map { case (mod, div) =>
(pos / div) % mod
}
}
This gives for a given linear read position the indices into the in
section. E.g.
(0 until 24).map(i => calcInSection(i, selSome)).foreach(println)
Vector(0, 0, 0)
Vector(0, 0, 1)
Vector(0, 0, 2)
Vector(0, 0, 3)
Vector(0, 1, 0)
Vector(0, 1, 1)
Vector(0, 1, 2)
Vector(0, 1, 3)
Vector(0, 2, 0)
Vector(0, 2, 1)
Vector(0, 2, 2)
Vector(0, 2, 3)
Vector(1, 0, 0)
Vector(1, 0, 1)
Vector(1, 0, 2)
Vector(1, 0, 3)
Vector(1, 1, 0)
Vector(1, 1, 1)
Vector(1, 1, 2)
Vector(1, 1, 3)
Vector(1, 2, 0)
Vector(1, 2, 1)
Vector(1, 2, 2)
Vector(1, 2, 3)
So for an individual readNext
, say the first five:
(0 to 4).map(i => calcInSection(i, selSome)).foreach(println)
// first sub-read
Vector(0, 0, 0)
Vector(0, 0, 1)
Vector(0, 0, 2)
Vector(0, 0, 3)
// second sub-read
Vector(0, 1, 0)
The next five elements:
(5 to 9).map(i => calcInSection(i, selSome)).foreach(println)
// first read
Vector(0, 1, 1)
Vector(0, 1, 2)
Vector(0, 1, 3)
// second read
Vector(0, 2, 0)
Vector(0, 2, 1)
etc.