Physical types are the classes that create in-memory representations (or code generate the in-memory representations), of Hail Types (Virtual Types). They serve as the implementations of Virtual Types, which are interfaces
Where possible Physical Type behavior should follow Python type behavior.
This proposal deals with the architectural goals of the PType implementation for 2020 Q1.
- Improve performance by building specialized memory representations for data (improve developer velocity / enable performance optimizations in the future).
- Abstract PType interfaces define code-generation and interpretation primitives (for example, PCanonicalArray concretely implements the PArray interface).
- Remove requiredness from virtual types
- Introduce the following invariant in the codebase: All region methods / Memory methods are used only in the ptypes hierarchy when dealing with values of Hail types.
PNDArray
- Specialized implementations (canonical/non)
PCall
- Specialized implementations (canonical/non)
PInterval
- Specialized implementations (canonical/non)
PFloat32
PFloat64
PInt32
PInt64
PString
PBinary
PVoid
Utility methods
def store(destinationAddress: Long, destinationType: PType, value: Long, valueType: PType): Unit
def store(destinationAddress: Code[Long], destinationType: PType, value: Code[Long], valueType: PType): Unit`
- WIP, modeled on: https://github.com/hail-is/hail/pull/7639/files#diff-5e71dd9f25b178ccf031784b2ffe232bR159
- Aim is to call the PType's (upcoming) store method, with signature (
def store(value: Code[Long], valueType: PType): Code[Unit]
)
An abstract class for an immutable ordered collections where all elements are of a single type. Does not contain the value constructor (e.g allocate)
(Each method has a staged version)
def loadLength(arrayAddress: Long): Long = ...
def loadLength(arrayAddress: Code[Long]): Code[Long] = ...
def isElementMissing(arrayAddress: Long, index: Int): Boolean= ...
def isElementMissing(arrayAddress: Long, index: Code[Int]): Code[Boolean] = ...
def loadElementAddress(arrayAddress: Long, index: Int): Long = ...
def loadElementAddress(arrayAddress: Code[Long], index: Code[Int]): Code[Long] = ...
- Renamed from
loadElement
because this function only returns the address of the element, not the element itself - Does not take a region instance because memory addresses are valid across regions. In the current
loadElement
signatures that take a region instance, we do not use that region instance. The only cases that a region instance would be needed is if loadElement needs to allocate memory off-heap, but this seems semantically inconsistent with loading (instead that would be a value construction, which happens in allocate)
- Do we want to allow a range of maximum array lengths (not just 32 bit)
- Do we want to have a loadElement that returns the actual data stored at that address? Currently the caller always needs to be perform a second step, at the cost of more allocations (and the number of bytes returned will be greater for an address than any primitive besides Long and Double)
- Do we want a
loadElements
that returns an iterable? This would save the caller boilerplate: currently they need to store the length of the array, an index variable, and manually construct a while loop, check whether an element is missing, (typically over non-null elements)
PCanonicalArray(elementType: PType, required: Boolean = false)
def allocate(region: Region length: Long): Long = ...
def allocate(region: Code[Region], length: Code[Long]): Code[Long] = ...
- Allocate the value array (e.g code-generate allocation) and returns the memory address of (the start of) the set
def setElement(arrayAddress: Long, index: Int, value: Annotation): Unit = ...
def setElement(arrayAddress: Code[Long], index: Code[Int], value: Code[Annotation]): Code[Unit] = ...
- Set the value at the given element. Assumes allocation. Does not track whether value has already been set
An abstract class for immutable (potentially unordered) collections of values where all values are unique and of one type. Does not contain the value constructor (e.g allocate)
- TODO: Not sure what the intended semantics of our sets are, besides uniqueness. Should we be able to access them by index? Similar question about dictionary ptypes.
(Each method has a staged version)
def loadLength(arrayAddress: Long): Long = ...
def loadLength(arrayAddress: Code[Long]): Code[Long] = ...
- Returns the array length
def isElementMissing(arrayAddress: Long, index: Int): Boolean
def isElementMissing(arrayAddress: Code[Long], index: Code[Int]): Code[Boolean]
def loadElementAddress(arrayAddress: Long, index: Int): Long
def loadElementAddresst(arrayAddress: Code[Long], index: Code[Int]): Code[Long]
- Why shouldn't loadElementAddress take a hashable value here? Code gen for figuring out the address of an unordered set by value seems like PType domain.
PSet(elementType: PType, required: Boolean = false)
def allocate(region: Region length: Long): Long = ...
def allocate(region: Code[Region], length: Code[Long]): Code[Long] = ...
- Allocate the value array (e.g code-generate allocation) and returns the memory address of [the start of] the array
def setElement(arrayAddress: Long, index: Int, value: Annotation): Unit = ...
def setElement(arrayAddress: Code[Long], index: Code[Int], value: Code[Annotation]): Code[Unit] = ...
- Insert a value at the index
An abstract class for immutable unordered collections of key:value pairs where keys are unique. Keys must all be of the same type, and values must all be of the same type (though can be different than the key type). Does not contain the value constructor (e.g allocate)
(Each method has a staged version)
def loadLength(arrayAddress: Long): Long = ...
def loadLength(arrayAddress: Code[Long]): Code[Long] = ...
def loadElementAddress(arrayAddress: Long, index: Int): Long = ...
def loadElement(arrayAddress: Code[Long], index: Code[Int]): Long = ...
PCanonicalDict((keyType: PType, valueType: PType, required: Boolean = false)
def allocate(region: Region, length: Int): Long = ...
def allocate(region: Code[Region], length: Code[Int]): Code[Long] = ...
- Returns the address to the start of the dictionary
An abstract class for immutable ordered collections of values that may be of different types. Does not contain the value constructor (e.g allocate)
(Each method has a staged version)
def loadLength(arrayAddress: Long): Long = ...
def loadLength(arrayAddress: Code[Long]): Code[Long] = ...
- Returns the array length
def loadElement(arrayAddress: Long, index: Long): Option[AnyVal]
def loadElement(arrayAddress: Code[Long], index: Long): Code[Optional[AnyVal]] = ...
- Same semantics as PArray
PCanonicalTuple(fields: IndexeSeq[PType], required: Boolean = false)
def allocate(region: Region length: Long): Long = ...
def allocate(region: Code[Region], length: Code[Long]): Code[Long] = ...
An abstract class for immutable collections of (key, value) pairs of (potentially) different types. Keys are always strings. Values are looked up by key only. Does not contain the value constructor (e.g allocate)
(Each method has a staged version)
def loadLength(arrayAddress: Long): Long = ...
def loadLength(arrayAddress: Code[Long]): Code[Long] = ...
- Returns the array length
def loadElement(arrayAddress: Long, fieldName: String): Option[AnyVal]
def loadElement(arrayAddress: Code[Long], fieldName: String): Code[Optional[AnyVal]] = ...
- Same return value semantics as PArray with regard to missingness
PCanonicalStruct(fields: Seq[String, PType], required: Boolean = false)
def allocate(region: Region length: Long): Long = ...
def allocate(region: Code[Region], length: Code[Long]): Code[Long] = ...
A representation of a chromosomal locus, encapsulating the reference genome, chromosome (called contig in our documentation), and position.
(Each method has a staged version)
def reference(arrayAddress: Long): String = ...
def reference(arrayAddress: Code[Long]): Code[String] = ...
def chromosome(arrayAddress: Long): String = ...
def chromosome(arrayAddress: Code[Long]): Code[String] = ...
def position(arrayAddress: Long): Long = ...
def position(arrayAddress: Code[Long]): Code[Long] = ...
PCanonicalLocus(reference: PString, chromosome: PString, position: PInt64)
Questions:
- Do we need to have a value constructor for PLocus?TODO: need some construction method
An abstract class for immutable (potentially unordered) collections of values where all values are unique and of one type. Does not contain the value constructor (e.g allocate)
- TODO: This is wrong I think. Not sure what the intended semantics of our sets is, besides uniqueness. Should we be able to access them by index?
(Each method has a staged version)
def loadLength(arrayAddress: Long): Long = ...
def loadLength(arrayAddress: Code[Long]): Code[Long] = ...
- Returns the array length
def loadElement(arrayAddress: Long, item: AnyVal): Option[AnyVal]
def loadElement(arrayAddress: Code[Long], item: Hashable): Code[Optional[AnyVal]] = ...
- The return semantics for PSet's loadElement instance method are identical to PCanonicaArray's loadElement instance method
PSet(elementType: PType, required: Boolean = false)
def allocate(region: Region length: Long): Long = ...
def allocate(region: Code[Region], length: Code[Long]): Code[Long] = ...
- Construct the value array (e.g code-generate allocation and insertion) and returns the memory address of [the start of] the array
An abstract class for immutable unordered collections of key:value pairs where keys are unique. Keys must all be of the same type, and values must all be of the same type (though can be different than the key type). Does not contain the value constructor (e.g allocate)
(Each method has a staged version)
def loadLength(arrayAddress: Long): Long = ...
def loadLength(arrayAddress: Code[Long]): Code[Long] = ...
- Returns the array length
def loadElement(arrayAddress: Long, key: Hashable): Option[AnyVal]
def loadElement(arrayAddress: Code[Long], key: Hashable): Code[Optional[AnyVal]] = ...
- Returns the key's corresponding value, if present. In the interpreted version, uses Scala's Option, and requires matching on Some/None. In staged version, uses Java's Optional semantics, match on v.isNull, just like PArray's loadElement.
PCanonicalDict((keyType: PType, valueType: PType, required: Boolean = false)
def allocate(region: Region length: Long): Long = ...
def allocate(region: Code[Region], length: Code[Long]): Code[Long] = ...
An abstract class for immutable ordered collections of values that may be of different types. Does not contain the value constructor (e.g allocate)
(Each method has a staged version)
def loadLength(arrayAddress: Long): Long = ...
def loadLength(arrayAddress: Code[Long]): Code[Long] = ...
- Returns the array length
def loadElement(arrayAddress: Long, index: Long): Option[AnyVal]
def loadElement(arrayAddress: Code[Long], index: Long): Code[Optional[AnyVal]] = ...
- Same semantics as PArray
PCanonicalTuple(fields: IndexeSeq[PType], required: Boolean = false)
def allocate(region: Region length: Long): Long = ...
def allocate(region: Code[Region], length: Code[Long]): Code[Long] = ...
An abstract class for immutable collections of (key, value) pairs of (potentially) different types. Keys are always strings. Values are looked up by key only. Does not contain the value constructor (e.g allocate)
(Each method has a staged version)
def loadLength(arrayAddress: Long): Long = ...
def loadLength(arrayAddress: Code[Long]): Code[Long] = ...
- Returns the array length
def loadElement(arrayAddress: Long, fieldName: String): Option[AnyVal]
def loadElement(arrayAddress: Code[Long], fieldName: String): Code[Optional[AnyVal]] = ...
- Same return value semantics as PArray with regard to missingness
PCanonicalStruct(fields: Seq[String, PType], required: Boolean = false)
def allocate(region: Region length: Long): Long = ...
def allocate(region: Code[Region], length: Code[Long]): Code[Long] = ...
A representation of a chromosomal locus, encapsulating the reference genome, chromosome (called contig in our documentation), and position.
(Each method has a staged version)
def reference(arrayAddress: Long): String = ...
def reference(arrayAddress: Code[Long]): Code[String] = ...
def chromosome(arrayAddress: Long): String = ...
def chromosome(arrayAddress: Code[Long]): Code[String] = ...
def position(arrayAddress: Long): Long = ...
def position(arrayAddress: Code[Long]): Code[Long] = ...
PCanonicalLocus(reference: PString, chromosome: PString, position: PInt64)
Questions:
- Do we need to have a value constructor for PLocus