Explicit count of the number of items.
For lazy or distributed data, will force a computation.
Return the first element.
Convert to Blocks, each representing a subdivision of the larger Images data.
-
size
str
or tuple of block size per dimension,
String interpreted as memory size (in megabytes, e.g. "64"). Tuple of ints interpreted as "pixels per dimension". Only valid in spark mode.
Converts this Images object to a TimeSeries object.
This method is equivalent to images.asBlocks(size).asSeries().asTimeSeries().
-
size
string memory size
optional
default = "150M"
String interpreted as memory size (e.g. "64M").
-
units
string
either "pixels" or "splits"
default = "pixels"
What units to use for a tuple size.
Converts this Images object to a Series object.
This method is equivalent to images.toblocks(size).toSeries().
-
size
string memory size
optional
default = "150M"
String interpreted as memory size (e.g. "64M").
Convert to local representation.
Convert to spark representation.
Execute a function on each image
Extract random sample of series.
-
nsamples
int
optional
default = 100
The number of data points to sample.
-
seed
int
optional
default = None
Random seed.
Map an array -> array function over each image
Filter images
Reduce over images
Compute the mean across images
Compute the variance across images
Compute the standard deviation across images
Compute the sum across images
Compute the max across images
Compute the min across images
Remove single-dimensional axes from images.
Compute maximum projections of images / volumes along the specified dimension.
-
axis
int
optional
default = 2
Which axis to compute projection along
Compute maximum-minimum projections of images / volumes along the specified dimension. This computes the sum of the maximum and minimum values along the given dimension.
-
axis
int
optional
default = 2
Which axis to compute projection along
Downsample an image volume by an integer factor
-
sample_factor
positive int or tuple of positive ints
Stride to use in subsampling. If a single int is passed, each dimension of the image will be downsampled by this same factor. If a tuple is passed, it must have the same dimensionality of the image. The strides given in a passed tuple will be applied to each image dimension.
Spatially smooth images with a gaussian filter.
Filtering will be applied to every image in the collection.
parameters
-
sigma
scalar or sequence of scalars
default=2
Size of the filter size as standard deviation in pixels. A sequence is interpreted as the standard deviation for each axis. A single scalar is applied equally to all axes.
-
order
choice of 0 / 1 / 2 / 3 or sequence from same set
optional
default = 0
Order of the gaussian kernel, 0 is a gaussian, higher numbers correspond to derivatives of a gaussian.
Spatially filter images using a uniform filter.
Filtering will be applied to every image in the collection.
parameters size: int, optional, default=2 Size of the filter neighbourhood in pixels. A sequence is interpreted as the neighborhood size for each axis. A single scalar is applied equally to all axes.
Spatially filter images using a median filter.
Filtering will be applied to every image in the collection.
parameters size: int, optional, default=2 Size of the filter neighbourhood in pixels. A sequence is interpreted as the neighborhood size for each axis. A single scalar is applied equally to all axes.
Correlate every pixel to the average of its local neighborhood.
This algorithm computes, for every spatial record, the correlation coefficient between that record's series, and the average series of all records within a local neighborhood with a size defined by the neighborhood parameter. The neighborhood is currently required to be a single integer, which represents the neighborhood size in both x and y.
-
neighborhood
int
optional
default=2
Size of the correlation neighborhood (in both the x and y directions), in pixels.
Subtract a constant value or an image / volume from all images / volumes in the data set.
-
val
int
float
or ndarray
Value to subtract
Write 2d or 3d images as PNG files.
Files will be written into a newly-created directory. Three-dimensional data will be treated as RGB channels.
-
path
string
Path to output directory, must be one level below an existing directory.
-
prefix
string
String to prepend to filenames.
-
overwrite
bool
If true, the directory given by path will first be deleted if it exists.
Write 2d or 3d images as TIF files.
Files will be written into a newly-created directory. Three-dimensional data will be treated as RGB channels.
-
path
string
Path to output directory, must be one level below an existing directory.
-
prefix
string
String to prepend to filenames.
-
overwrite
bool
If true, the directory given by path will first be deleted if it exists.
Write out images or volumes as flat binary files.
Files will be written into a newly-created directory.
-
path
string
Path to output directory, must be one level below an existing directory.
-
prefix
string
String to prepend to filenames.
-
overwrite
bool
If true, the directory given by path will first be deleted if it exists.
Efficiently apply a function to each time series
Applies a function to each time series without transforming all the way to a Series object, but using a Blocks object instead for increased efficiency in the transformation back to Images.
-
func
function
Function to apply to each time series. Should take one-dimensional ndarray and return the transformed one-dimensional ndarray.
-
value_size
int
optional
default=None
Size of the one-dimensional ndarray resulting from application of func. If not supplied, will be automatically inferred for an extra computational cost.
-
block_size
str
or tuple of block size per dimension,
String interpreted as memory size (in megabytes e.g. "64"). Tuple of ints interpreted as "pixels per dimension".
Reshape all dimensions but the last into a single dimension
Explicit count of the number of items.
For lazy or distributed data, will force a computation.
Return the first element.
Convert to local representation.
Convert to spark representation.
Extract random sample of series.
-
nsamples
int
optional
default = 100
The number of data points to sample.
-
seed
int
optional
default = None
Random seed.
Map an array -> array function over each series
Filter by applying a function to each series.
Reduce over series.
Compute the mean across images
Compute the variance across images
Compute the standard deviation across images
Compute the sum across images
Compute the max across images
Compute the min across images
Select subset of values within the given index range
Inclusive on the left; exclusive on the right.
-
left
int
Left-most index in the desired range
right: int Right-most index in the desired range
Select subset of values that match a given index criterion
-
crit
function
list
str
int
Criterion function to map to indices, specific index value, or list of indices
Center series data by subtracting the mean either within or across records
-
axis
int
optional
default = 0
Which axis to center along, within (1) or across (0) records
Standardize series data by dividing by the standard deviation either within or across records
-
axis
int
optional
default = 0
Which axis to standardize along, within (1) or across (0) records
Zscore series data by subtracting the mean and dividing by the standard deviation either within or across records
-
axis
int
optional
default = 0
Which axis to zscore along, within (1) or across (0) records
Set all records that do not exceed the given threhsold to 0
-
threshold
scalar
Level below which to set records to zero
Correlate series data against one or many one-dimensional arrays.
-
signal
array
or str
Signal(s) to correlate against, can be a numpy array or a MAT file containing the signal as a variable
Compute the value maximum of each record in a Series
Compute the value minimum of each record in a Series
Compute the value sum of each record in a Series
Compute the value mean of each record in a Series
Compute the value median of each record in a Series
Compute the value percentile of each record in a Series.
-
q
scalar
Floating point number between 0 and 100 inclusive, specifying percentile.
return self.series_stat('stdev')
series_stat(self, stat):
Compute a simple statistic for each record in a Series
-
stat
str
Which statistic to compute
Compute many statistics for each record in a Series
Compute the mean across fixed sized panels of each record.
Splits each record into panels of size length
,
and then computes the mean across panels.
Panel length must subdivide record exactly.
-
length
int
Fixed length with which to subdivide.
Select or filter elements of the Series by index values (across levels, if multi-index).
The index is a property of a Series object that assigns a value to each position within the arrays stored in the records of the Series. This function returns a new Series where, within each record, only the elements indexed by a given value(s) are retained. An index where each value is a list of a fixed length is referred to as a 'multi-index', as it provides multiple labels for each index location. Each of the dimensions in these sublists is a 'level' of the multi-index. If the index of the Series is a multi-index, then the selection can proceed by first selecting one or more levels, and then selecting one or more values at each level.
-
val
list of lists
Specifies the selected index values. List must contain one list for each level of the multi-index used in the selection. For any singleton lists, the list may be replaced with just the integer.
-
level
list of ints
optional
default=0
Specifies which levels in the multi-index to use when performing selection. If a single level is selected, the list can be replaced with an integer. Must be the same length as val.
-
squeeze
bool
optional
default=False
If True, the multi-index of the resulting Series will drop any levels that contain only a single value because of the selection. Useful if indices are used as unique identifiers.
-
filter
bool
optional
default=False
If True, selection process is reversed and all index values EXCEPT those specified are selected.
-
return_mask
bool
optional
default=False
If True, return the mask used to implement the selection.
Aggregrate data in each record, grouping by index values.
For each unique value of the index, applies a function to the group indexed by that value. Returns a Series indexed by those unique values. For the result to be a valid Series object, the aggregating function should return a simple numeric type. Also allows selection of levels within a multi-index. See select_by_index for more info on indices and multi-indices.
-
function
function
Aggregating function to map to Series values. Should take a list or ndarray as input and return a simple numeric value.
-
level
list of ints
optional
default=0
Specifies the levels of the multi-index to use when determining unique index values. If only a single level is desired, can be an int.
Compute the desired statistic for each uniue index values (across levels, if multi-index)
-
stat
string
Statistic to be computed: sum, mean, median, stdev, max, min, count
-
level
list of ints
optional
default=0
Specifies the levels of the multi-index to use when determining unique index values. If only a single level is desired, can be an int.
Compute sums for each unique index value (across levels, if multi-index)
Compute means for each unique index value (across levels, if multi-index)
Compute medians for each unique index value (across levels, if multi-index)
Compute means for each unique index value (across levels, if multi-index)
Compute maximum values for each unique index value (across levels, if multi-index)
Compute minimum values for each unique index value (across level, if multi-index)
Count the number for each unique index value (across levels, if multi-index)
Compute covariance of a distributed matrix.
-
axis
int
optional
default = None
Axis for performing mean subtraction, None (no subtraction), 0 (rows) or 1 (columns)
Compute gramian of a distributed matrix.
The gramian is defined as the product of the matrix with its transpose, i.e. A^T * A.
Multiply a matrix by another one.
Other matrix must be a numpy array, a scalar, or another matrix in local mode.
-
other
Matrix
scalar
or numpy array
A matrix to multiply with
Convert Series to TimeSeries, a subclass for time series computation.
Converts Series to Images.
Equivalent to calling series.toBlocks(size).toImages()
-
size
str
optional
default = "150M"
String interpreted as memory size.
Write data to binary files.
-
path
string path or URI to directory to be created
Output files will be written underneath path. Directory will be created as a result of this call.
-
prefix
str
optional
default = 'series'
String prefix for files.
-
overwrite
bool
If true, path and all its contents will be deleted and recreated as partof this call.
Load Images object from a Spark RDD.
Must be a collection of key-value pairs where keys are singleton tuples indexing images, and values are 2d or 3d ndarrays.
-
rdd
SparkRDD
An RDD containing images
-
dims
tuple or array
optional
default = None
Image dimensions (if provided will avoid check).
-
nrecords
int
optional
default = None
Number of images (if provided will avoid check).
-
dtype
string
default = None
Data numerical type (if provided will avoid check)
Load Series object from a local array-like.
First dimension will be used to index images, so remaining dimensions after the first should be the dimensions of the images/volumes, e.g. (3, 100, 200) for 3 x (100, 200) images
-
values
array-like
The array of images
-
npartitions
int
default = None
Number of partitions for parallelization (Spark only)
-
engine
object
default = None
Computational engine (e.g. a SparkContext for Spark)
Load images from a list of items using the given accessor.
-
accessor
function
Apply to each item from the list to yield an image
-
keys
list
optional
default=None
An optional list of keys
-
dims
tuple
optional
default=None
Specify a known image dimension to avoid computation.
-
npartitions
int
Number of partitions for computational engine
frompath(path, accessor=None, ext=None, start=None, stop=None, recursive=False, npartitions=None, dims=None, dtype=None, recount=False, engine=None, credentials=None)
Load images from a path using the given accessor.
Supports both local and remote filesystems.
-
accessor
function
Apply to each item after loading to yield an image.
-
ext
str
optional
default=None
File extension.
-
npartitions
int
optional
default=None
Number of partitions for computational engine, if None will use default for engine.
-
dims
tuple
optional
default=None
Dimensions of images.
-
dtype
str
optional
default=None
Numerical type of images.
start, stop: nonnegative int, optional, default=None Indices of files to load, interpreted using Python slicing conventions.
-
recursive
boolean
optional
default=False
If true, will recursively descend directories from path, loading all files with an extension matching 'ext'.
-
recount
boolean
optional
default=False
Force subsequent record counting.
frombinary(path, shape=None, dtype=None, ext='bin', start=None, stop=None, recursive=False, nplanes=None, npartitions=None, conf='conf.json', order='C', engine=None, credentials=None)
Load images from flat binary files.
Assumes one image per file, each with the shape and ordering as given by the input arguments.
-
path
str
Path to data files or directory, specified as either a local filesystem path or in a URI-like format, including scheme. May include a single '*' wildcard character.
-
shape
tuple of positive int
Dimensions of input image data.
-
ext
string
optional
default="bin"
Extension required on data files to be loaded.
-
start, stop
nonnegative int
optional
default=None
Indices of the first and last-plus-one file to load, relative to the sorted filenames matching
path
andext
. Interpreted using python slice indexing conventions. -
recursive
boolean
optional
default=False
If true, will recursively descend directories from path, loading all files with an extension matching 'ext'.
-
nplanes
positive integer
optional
default=None
If passed, will cause single files to be subdivided into nplanes separate images. Otherwise, each file is taken to represent one image.
-
npartitions
int
optional
default=None
Number of partitions for computational engine, if None will use default for engine.
fromtif(path, ext='tif', start=None, stop=None, recursive=False, nplanes=None, npartitions=None, engine=None, credentials=None)
Loads images from single or multi-page TIF files.
-
path
str
Path to data files or directory, specified as either a local filesystem path or in a URI-like format, including scheme. May include a single '*' wildcard character.
-
ext
string
optional
default="tif"
Extension required on data files to be loaded.
-
start, stop
nonnegative int
optional
default=None
Indices of the first and last-plus-one file to load, relative to the sorted filenames matching 'path' and 'ext'. Interpreted using python slice indexing conventions.
-
recursive
boolean
optional
default=False
If true, will recursively descend directories from path, loading all files with an extension matching 'ext'.
-
nplanes
positive integer
optional
default=None
If passed, will cause single files to be subdivided into nplanes separate images. Otherwise, each file is taken to represent one image.
-
npartitions
int
optional
default=None
Number of partitions for computational engine, if None will use default for engine.
frompng(path, ext='png', start=None, stop=None, recursive=False, npartitions=None, engine=None, credentials=None)
Load images from PNG files.
-
path
str
Path to data files or directory, specified as either a local filesystem path or in a URI-like format, including scheme. May include a single '*' wildcard character.
-
ext
string
optional
default="tif"
Extension required on data files to be loaded.
-
start, stop
nonnegative int
optional
default=None
Indices of the first and last-plus-one file to load, relative to the sorted filenames matching
path
andext
. Interpreted using python slice indexing conventions. -
recursive
boolean
optional
default=False
If true, will recursively descend directories from path, loading all files with an extension matching 'ext'.
-
npartitions
int
optional
default=None
Number of partitions for computational engine, if None will use default for engine.
Generate random image data.
-
shape
tuple
optional
default=(10
50
50)
Dimensions of images.
-
npartitions
int
optional
default=1
Number of partitions.
-
seed
int
optional
default=42
Random seed.
Load example image data.
Data must be downloaded from S3, so this method requires an internet connection.
-
name
str
Name of dataset, if not specified will print options.
Load Series object from a Spark RDD.
Assumes keys are tuples with increasing and unique indices, and values are 1d ndarrays. Will try to infer properties that are not explicitly provided.
-
rdd
SparkRDD
An RDD containing series data.
-
shape
tuple or array
optional
default = None
Total shape of data (if provided will avoid check).
-
nrecords
int
optional
default = None
Number of records (if provided will avoid check).
-
index
array
optional
default = None
Index for records, if not provided will use (0, 1, ...)
-
dtype
string
default = None
Data numerical type (if provided will avoid check)
Load Series object from a local numpy array.
Assumes that all but final dimension index the records, and the size of the final dimension is the length of each record, e.g. a (2, 3, 4) array will be treated as 2 x 3 records of size (4,)
-
values
array-like
An array containing the data.
-
index
array
optional
default = None
Index for records, if not provided will use (0,1,...,N) where N is the length of each record.
-
npartitions
int
default = None
Number of partitions for parallelization (Spark only)
-
engine
object
default = None
Computational engine (e.g. a SparkContext for Spark)
Create a Series object from a list of items and optional accessor function.
Will call accessor function on each item from the list, providing a generic interface for data loading.
-
items
list
A list of items to load.
-
accessor
function
optional
default = None
A function to apply to each item in the list during loading.
-
index
array
optional
default = None
Index for records, if not provided will use (0,1,...,N) where N is the length of each record.
-
dtype
string
default = None
Data numerical type (if provided will avoid check)
-
npartitions
int
default = None
Number of partitions for parallelization (Spark only)
-
engine
object
default = None
Computational engine (e.g. a SparkContext for Spark)
fromtext(path, ext='txt', dtype='float64', skip=0, shape=None, index=None, npartitions=None, engine=None, credentials=None)
Loads Series data from text files.
Assumes data are formatted as rows, where each record is a row of numbers separated by spaces e.g. 'v v v v v'. You can optionally specify a fixed number of initial items per row to skip / discard.
-
path
string
Directory to load from, can be a URI string with scheme (e.g. "file://", "s3n://", or "gs://"), or a single file, or a directory, or a directory with a single wildcard character.
-
ext
str
optional
default = 'txt'
File extension.
dtype: dtype or dtype specifier, default 'float64' Numerical type to use for data after converting from text.
-
skip
int
optional
default = 0
Number of items in each record to skip.
-
shape
tuple or list
optional
default = None
Shape of data if known, will be inferred otherwise.
-
index
array
optional
default = None
Index for records, if not provided will use (0, 1, ...)
-
npartitions
int
default = None
Number of partitions for parallelization (Spark only)
-
engine
object
default = None
Computational engine (e.g. a SparkContext for Spark)
-
credentials
dict
default = None
Credentials for remote storage (e.g. S3) in the form {access: ***, secret: ***}
frombinary(path, ext='bin', conf='conf.json', dtype=None, shape=None, skip=0, index=None, engine=None, credentials=None)
Load a Series object from flat binary files.
-
path
string URI or local filesystem path
Directory to load from, can be a URI string with scheme (e.g. "file://", "s3n://", or "gs://"), or a single file, or a directory, or a directory with a single wildcard character.
-
ext
str
optional
default = 'bin'
Optional file extension specifier.
-
conf
str
optional
default = 'conf.json'
Name of conf file with type and size information.
dtype: dtype or dtype specifier, default 'float64' Numerical type to use for data after converting from text.
-
shape
tuple or list
optional
default = None
Shape of data if known, will be inferred otherwise.
-
skip
int
optional
default = 0
Number of items in each record to skip.
-
index
array
optional
default = None
Index for records, if not provided will use (0, 1, ...)
-
engine
object
default = None
Computational engine (e.g. a SparkContext for Spark)
-
credentials
dict
default = None
Credentials for remote storage (e.g. S3) in the form {access: ***, secret: ***}
Generate gaussian random series data.
-
shape
tuple
Dimensions of data.
-
npartitions
int
Number of partitions with which to distribute data.
-
seed
int
Randomization seed.
Load example series data.
Data must be downloaded from S3, so this method requires an internet connection.
-
name
str
Name of dataset, options include 'iris' | 'mouse' | 'fish'. If not specified will print options.