Skip to content

Instantly share code, notes, and snippets.

@Sciss
Created January 25, 2017 16:18
Show Gist options
  • Save Sciss/743d69e26710cda3587e9ccaedf064f2 to your computer and use it in GitHub Desktop.
Save Sciss/743d69e26710cda3587e9ccaedf064f2 to your computer and use it in GitHub Desktop.

I have an API for reading multi-dimensional variables that has a read method. For simplicity, I define these in a pseudo-Scala language here, but the question is really about the algorithm not the particular language. So:

trait Variable[A] {
  def shape: List[Int]

  def read(sections: List[Range]): List[A]
}

(in the concrete case the variable is ucar.nc2.Variable, the range is ucar.ma2.Section, and the data is ucar.ma2.Array in UCAR's NetCDF-Java library). I have an input spec for sections, e.g.

val in: List[Range] = ???

The range here can be thought of as Scala's Range type, with a start index (inclusive), stop index (exclusive), and a step size (possibly 1), i.e. start until stop by step.

If I were to feed this into a single read call, I would obtain a Data object which has a linear iterator giving

val size: Long = in.map(_.size).product

elements. Now I have an instance that wants to read all these elements in chunks of size chunkSize. So I want to define an API

trait ChunkReader[A] {
  private var pos: Long = 0  // advanced by readNext

  private def v: Variable[A]
  private def in: List[Range]

  def size: Long = in.map(_.size).product

  def readNext(chunkSize: Int): List[A] = ???
}

Given the current state (pos) and a call to ChunkReader#readNext with a given chunkSize, how can I implement this method using a minimal number of calls to Variable#read and a minimum "overhead" of reading unused data that I would have to skip when generating resulting list?


A concrete example: A three-dimensional variable, an an example section being

List(
  2 until 4,  // size 2
  3 until 6,  // size 3
  4 until 8   // size 4
)

So I want to piecewise read all the 2 * 3 * 4 = 24 elements specified by the section. Say I want to call ChunkReader#readNext four times with chunkSize = 5 and finally with chunkSize = 4. How would I calculate the corresponding sections to be handed to Variable#read?

For example, I can manually think of the first call as giving

val first_four = v.read(List(
  2 until 3,
  3 until 4,,
  4 until 8
))
val fifth = v.read(List(
  2 until 3,
  4 until 5,
  4 until 5
))
return first_four ++ fifth

My intuition says, I can write this generally, always using between one and three calls to Variable#read?


So here is a dummy implementation of Variable to play with (I'm using Vec = scala.collection.immutable.Vector here instead of List):

type Vec[+A] = collection.immutable.Vector[A]
val  Vec     = collection.immutable.Vector

class Variable[A](data: Vec[A], shape: Vec[Int]) {
  require(data.size == shape.product)
  
  def read(sections: Vec[Range]): Vec[A] = {
    require(sections.size == shape.size)
    require(sections.zipWithIndex.forall { case (r, ri) => 
      r.forall(i => i >= 0 && i < shape(ri)) })
    
    val sz  = if (sections.isEmpty) 0 else (1 /: sections)(_ * _.size)
    val zip = (shape zip sections).reverse
    Vec.tabulate(sz) { i =>
      val (j, _, _) = ((0, 1, 1) /: zip) { case ((res, m, n), (dim, r)) =>
        val add = r((i / n) % r.size) * m
        (res + add, m * dim, n * r.size)
      }
      data(j)
    }
  }
}

Here is the example with a section of shape [2][3][4] and the variable for example having shape [8][8][8] and data being integers from 1 to (888 = 512):

val v = new Variable[Int](Vec(1 to 512: _*), Vec(8, 8, 8))
    
assert(v.read(Vec(0 until 0, 0 until 0, 0 until 0)) == Vec())
assert(v.read(Vec(0 to 0, 0 to 0, 0 to 2)) == Vec(1, 2, 3))
assert(v.read(Vec(1 to 1, 1 to 1, 0 to 2)) == Vec(73, 74, 75))

val selSome = Vec(
  2 until 4,  // size 2
  3 until 6,  // size 3
  4 until 8   // size 4
)

assert(v.read(selSome) == Vec(
  157, 158, 159, 160, 
  165, 166, 167, 168, 
  173, 174, 175, 176, 
  
  221, 222, 223, 224, 
  229, 230, 231, 232, 
  237, 238, 239, 240
))

Here is the task:

trait ChunkReader[A] {
  def read(chunkSize: Int): Vec[A]
}

def test(c: ChunkReader[Int]): Unit =
  assert((1 to 4).map(c.read(5)) ++ c.read(4) == Vec(
    157, 158, 159, 160, 
    165, 166, 167, 168, 
    173, 174, 175, 176, 
    
    221, 222, 223, 224, 
    229, 230, 231, 232, 
    237, 238, 239, 240
  ))

// test(???)

Here is an auxiliary function that I think will help solve the riddle:

def calcInSection(pos: Int, sections: Vec[Range]): Vec[Int] = {
  val sizes    = sections.map(_.size)
  val modsDivs = sizes zip sizes.scanRight(1)(_ * _).tail
  modsDivs.map { case (mod, div) =>
    (pos / div) % mod
  }
}

This gives for a given linear read position the indices into the in section. E.g.

(0 until 24).map(i => calcInSection(i, selSome)).foreach(println)

Vector(0, 0, 0)
Vector(0, 0, 1)
Vector(0, 0, 2)
Vector(0, 0, 3)
Vector(0, 1, 0)
Vector(0, 1, 1)
Vector(0, 1, 2)
Vector(0, 1, 3)
Vector(0, 2, 0)
Vector(0, 2, 1)
Vector(0, 2, 2)
Vector(0, 2, 3)
Vector(1, 0, 0)
Vector(1, 0, 1)
Vector(1, 0, 2)
Vector(1, 0, 3)
Vector(1, 1, 0)
Vector(1, 1, 1)
Vector(1, 1, 2)
Vector(1, 1, 3)
Vector(1, 2, 0)
Vector(1, 2, 1)
Vector(1, 2, 2)
Vector(1, 2, 3)

So for an individual readNext, say the first five:

(0 to 4).map(i => calcInSection(i, selSome)).foreach(println)

// first sub-read
Vector(0, 0, 0)
Vector(0, 0, 1)
Vector(0, 0, 2)
Vector(0, 0, 3)
// second sub-read
Vector(0, 1, 0)

The next five elements:

(5 to 9).map(i => calcInSection(i, selSome)).foreach(println)

// first read
Vector(0, 1, 1)
Vector(0, 1, 2)
Vector(0, 1, 3)

// second read
Vector(0, 2, 0)
Vector(0, 2, 1)

etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment