data types

images

`count()`

Explicit count of the number of items.

For lazy or distributed data, will force a computation.

`first()`

Return the first element.

`toblocks(size='150')`

Convert to Blocks, each representing a subdivision of the larger Images data.

size str or tuple of block size per dimension,

String interpreted as memory size (in megabytes, e.g. "64"). Tuple of ints interpreted as "pixels per dimension". Only valid in spark mode.

`totimeseries(size='150')`

Converts this Images object to a TimeSeries object.

This method is equivalent to images.asBlocks(size).asSeries().asTimeSeries().

size string memory size optional default = "150M"

String interpreted as memory size (e.g. "64M").
units string either "pixels" or "splits" default = "pixels"

What units to use for a tuple size.

`toseries(size='150')`

Converts this Images object to a Series object.

This method is equivalent to images.toblocks(size).toSeries().

size string memory size optional default = "150M"

String interpreted as memory size (e.g. "64M").

`tolocal()`

Convert to local representation.

`tospark(engine=None)`

Convert to spark representation.

`foreach(func)`

Execute a function on each image

`sample(nsamples=100, seed=None)`

Extract random sample of series.

nsamples int optional default = 100

The number of data points to sample.
seed int optional default = None

Random seed.

`map(func, dims=None, with_keys=False)`

Map an array -> array function over each image

`filter(func)`

Filter images

`reduce(func)`

Reduce over images

`mean()`

Compute the mean across images

`var()`

Compute the variance across images

`std()`

Compute the standard deviation across images

`sum()`

Compute the sum across images

`max()`

Compute the max across images

`min()`

Compute the min across images

`squeeze()`

Remove single-dimensional axes from images.

`max_projection(axis=2)`

Compute maximum projections of images / volumes along the specified dimension.

axis int optional default = 2

Which axis to compute projection along

`max_min_projection(axis=2)`

Compute maximum-minimum projections of images / volumes along the specified dimension. This computes the sum of the maximum and minimum values along the given dimension.

axis int optional default = 2

Which axis to compute projection along

`subsample(factor)`

Downsample an image volume by an integer factor

sample_factor positive int or tuple of positive ints

Stride to use in subsampling. If a single int is passed, each dimension of the image will be downsampled by this same factor. If a tuple is passed, it must have the same dimensionality of the image. The strides given in a passed tuple will be applied to each image dimension.

`gaussian_filter(sigma=2, order=0)`

Spatially smooth images with a gaussian filter.

Filtering will be applied to every image in the collection.

parameters

sigma scalar or sequence of scalars default=2

Size of the filter size as standard deviation in pixels. A sequence is interpreted as the standard deviation for each axis. A single scalar is applied equally to all axes.
order choice of 0 / 1 / 2 / 3 or sequence from same set optional default = 0

Order of the gaussian kernel, 0 is a gaussian, higher numbers correspond to derivatives of a gaussian.

`uniform_filter(size=2)`

Spatially filter images using a uniform filter.

Filtering will be applied to every image in the collection.

parameters size: int, optional, default=2 Size of the filter neighbourhood in pixels. A sequence is interpreted as the neighborhood size for each axis. A single scalar is applied equally to all axes.

`median_filter(size=2)`

Spatially filter images using a median filter.

Filtering will be applied to every image in the collection.

parameters size: int, optional, default=2 Size of the filter neighbourhood in pixels. A sequence is interpreted as the neighborhood size for each axis. A single scalar is applied equally to all axes.

`localcorr(neighborhood=2)`

Correlate every pixel to the average of its local neighborhood.

This algorithm computes, for every spatial record, the correlation coefficient between that record's series, and the average series of all records within a local neighborhood with a size defined by the neighborhood parameter. The neighborhood is currently required to be a single integer, which represents the neighborhood size in both x and y.

neighborhood int optional default=2

Size of the correlation neighborhood (in both the x and y directions), in pixels.

`subtract(val)`

Subtract a constant value or an image / volume from all images / volumes in the data set.

val int float or ndarray

Value to subtract

`topng(path, prefix="image", overwrite=False)`

Write 2d or 3d images as PNG files.

Files will be written into a newly-created directory. Three-dimensional data will be treated as RGB channels.

path string

Path to output directory, must be one level below an existing directory.
prefix string

String to prepend to filenames.
overwrite bool

If true, the directory given by path will first be deleted if it exists.

`totif(path, prefix="image", overwrite=False)`

Write 2d or 3d images as TIF files.

Files will be written into a newly-created directory. Three-dimensional data will be treated as RGB channels.

path string

Path to output directory, must be one level below an existing directory.
prefix string

String to prepend to filenames.
overwrite bool

If true, the directory given by path will first be deleted if it exists.

`tobinary(path, prefix="image", overwrite=False)`

Write out images or volumes as flat binary files.

Files will be written into a newly-created directory.

path string

Path to output directory, must be one level below an existing directory.
prefix string

String to prepend to filenames.
overwrite bool

If true, the directory given by path will first be deleted if it exists.

`map_as_series(func, value_size=None, block_size='150')`

Efficiently apply a function to each time series

Applies a function to each time series without transforming all the way to a Series object, but using a Blocks object instead for increased efficiency in the transformation back to Images.

func function

Function to apply to each time series. Should take one-dimensional ndarray and return the transformed one-dimensional ndarray.
value_size int optional default=None

Size of the one-dimensional ndarray resulting from application of func. If not supplied, will be automatically inferred for an extra computational cost.
block_size str or tuple of block size per dimension,

String interpreted as memory size (in megabytes e.g. "64"). Tuple of ints interpreted as "pixels per dimension".

series

`flatten()`

Reshape all dimensions but the last into a single dimension

`count()`

Explicit count of the number of items.

For lazy or distributed data, will force a computation.

`first()`

Return the first element.

`tolocal()`

Convert to local representation.

`tospark(engine=None)`

Convert to spark representation.

`sample(nsamples=100, seed=None)`

Extract random sample of series.

nsamples int optional default = 100

The number of data points to sample.
seed int optional default = None

Random seed.

`map(func, index=None, with_keys=False)`

Map an array -> array function over each series

`filter(func)`

Filter by applying a function to each series.

`reduce(func)`

Reduce over series.

`mean()`

Compute the mean across images

`var()`

Compute the variance across images

`std()`

Compute the standard deviation across images

`sum()`

Compute the sum across images

`max()`

Compute the max across images

`min()`

Compute the min across images

`between(left, right)`

Select subset of values within the given index range

Inclusive on the left; exclusive on the right.

left int

Left-most index in the desired range

right: int Right-most index in the desired range

`select(crit)`

Select subset of values that match a given index criterion

crit function list str int

Criterion function to map to indices, specific index value, or list of indices

`center(axis=1)`

Center series data by subtracting the mean either within or across records

axis int optional default = 0

Which axis to center along, within (1) or across (0) records

`standardize(axis=1)`

Standardize series data by dividing by the standard deviation either within or across records

axis int optional default = 0

Which axis to standardize along, within (1) or across (0) records

`zscore(axis=1)`

Zscore series data by subtracting the mean and dividing by the standard deviation either within or across records

axis int optional default = 0

Which axis to zscore along, within (1) or across (0) records

`squelch(threshold)`

Set all records that do not exceed the given threhsold to 0

threshold scalar

Level below which to set records to zero

`correlate(signal)`

Correlate series data against one or many one-dimensional arrays.

signal array or str

Signal(s) to correlate against, can be a numpy array or a MAT file containing the signal as a variable

`series_max()`

Compute the value maximum of each record in a Series

`series_min()`

Compute the value minimum of each record in a Series

`series_sum()`

Compute the value sum of each record in a Series

`series_mean()`

Compute the value mean of each record in a Series

`series_median()`

Compute the value median of each record in a Series

`series_percentile(q)`

Compute the value percentile of each record in a Series.

q scalar

Floating point number between 0 and 100 inclusive, specifying percentile.

`series_std()`

return self.series_stat('stdev')

series_stat(self, stat):

`series_stat(stat)`

Compute a simple statistic for each record in a Series

stat str

Which statistic to compute

`series_stats()`

Compute many statistics for each record in a Series

`mean_by_panel(length)`

Compute the mean across fixed sized panels of each record.

Splits each record into panels of size length, and then computes the mean across panels. Panel length must subdivide record exactly.

length int

Fixed length with which to subdivide.

`select_by_index(val, level=0, squeeze=False, filter=False, return_mask=False)`

Select or filter elements of the Series by index values (across levels, if multi-index).

The index is a property of a Series object that assigns a value to each position within the arrays stored in the records of the Series. This function returns a new Series where, within each record, only the elements indexed by a given value(s) are retained. An index where each value is a list of a fixed length is referred to as a 'multi-index', as it provides multiple labels for each index location. Each of the dimensions in these sublists is a 'level' of the multi-index. If the index of the Series is a multi-index, then the selection can proceed by first selecting one or more levels, and then selecting one or more values at each level.

val list of lists

Specifies the selected index values. List must contain one list for each level of the multi-index used in the selection. For any singleton lists, the list may be replaced with just the integer.
level list of ints optional default=0

Specifies which levels in the multi-index to use when performing selection. If a single level is selected, the list can be replaced with an integer. Must be the same length as val.
squeeze bool optional default=False

If True, the multi-index of the resulting Series will drop any levels that contain only a single value because of the selection. Useful if indices are used as unique identifiers.
filter bool optional default=False

If True, selection process is reversed and all index values EXCEPT those specified are selected.
return_mask bool optional default=False

If True, return the mask used to implement the selection.

`aggregate_by_index(function, level=0)`

Aggregrate data in each record, grouping by index values.

For each unique value of the index, applies a function to the group indexed by that value. Returns a Series indexed by those unique values. For the result to be a valid Series object, the aggregating function should return a simple numeric type. Also allows selection of levels within a multi-index. See select_by_index for more info on indices and multi-indices.

function function

Aggregating function to map to Series values. Should take a list or ndarray as input and return a simple numeric value.
level list of ints optional default=0

Specifies the levels of the multi-index to use when determining unique index values. If only a single level is desired, can be an int.

`stat_by_index(stat, level=0)`

Compute the desired statistic for each uniue index values (across levels, if multi-index)

stat string

Statistic to be computed: sum, mean, median, stdev, max, min, count
level list of ints optional default=0

Specifies the levels of the multi-index to use when determining unique index values. If only a single level is desired, can be an int.

`sum_by_index(level=0)`

Compute sums for each unique index value (across levels, if multi-index)

`mean_by_index(level=0)`

Compute means for each unique index value (across levels, if multi-index)

`median_by_index(level=0)`

Compute medians for each unique index value (across levels, if multi-index)

`std_by_index(level=0)`

Compute means for each unique index value (across levels, if multi-index)

`max_by_index(level=0)`

Compute maximum values for each unique index value (across levels, if multi-index)

`min_by_index(level=0)`

Compute minimum values for each unique index value (across level, if multi-index)

`count_by_index(level=0)`

Count the number for each unique index value (across levels, if multi-index)

`cov()`

Compute covariance of a distributed matrix.

axis int optional default = None

Axis for performing mean subtraction, None (no subtraction), 0 (rows) or 1 (columns)

`gramian()`

Compute gramian of a distributed matrix.

The gramian is defined as the product of the matrix with its transpose, i.e. A^T * A.

`times(other)`

Multiply a matrix by another one.

Other matrix must be a numpy array, a scalar, or another matrix in local mode.

other Matrix scalar or numpy array

A matrix to multiply with

`totimeseries()`

Convert Series to TimeSeries, a subclass for time series computation.

`toimages(size='150')`

Converts Series to Images.

Equivalent to calling series.toBlocks(size).toImages()

size str optional default = "150M"

String interpreted as memory size.

`tobinary(path, prefix='series', overwrite=False, credentials=None)`

Write data to binary files.

path string path or URI to directory to be created

Output files will be written underneath path. Directory will be created as a result of this call.
prefix str optional default = 'series'

String prefix for files.
overwrite bool

If true, path and all its contents will be deleted and recreated as partof this call.

reading

images

`fromrdd(rdd, dims=None, nrecords=None, dtype=None)`

Load Images object from a Spark RDD.

Must be a collection of key-value pairs where keys are singleton tuples indexing images, and values are 2d or 3d ndarrays.

rdd SparkRDD

An RDD containing images
dims tuple or array optional default = None

Image dimensions (if provided will avoid check).
nrecords int optional default = None

Number of images (if provided will avoid check).
dtype string default = None

Data numerical type (if provided will avoid check)

`fromarray(values, npartitions=None, engine=None)`

Load Series object from a local array-like.

First dimension will be used to index images, so remaining dimensions after the first should be the dimensions of the images/volumes, e.g. (3, 100, 200) for 3 x (100, 200) images

values array-like

The array of images
npartitions int default = None

Number of partitions for parallelization (Spark only)
engine object default = None

Computational engine (e.g. a SparkContext for Spark)

`fromlist(items, accessor=None, keys=None, dims=None, dtype=None, npartitions=None, engine=None)`

Load images from a list of items using the given accessor.

accessor function

Apply to each item from the list to yield an image
keys list optional default=None

An optional list of keys
dims tuple optional default=None

Specify a known image dimension to avoid computation.
npartitions int

Number of partitions for computational engine

`frompath(path, accessor=None, ext=None, start=None, stop=None, recursive=False, npartitions=None, dims=None, dtype=None, recount=False, engine=None, credentials=None)`

Load images from a path using the given accessor.

Supports both local and remote filesystems.

accessor function

Apply to each item after loading to yield an image.
ext str optional default=None

File extension.
npartitions int optional default=None

Number of partitions for computational engine, if None will use default for engine.
dims tuple optional default=None

Dimensions of images.
dtype str optional default=None

Numerical type of images.

start, stop: nonnegative int, optional, default=None Indices of files to load, interpreted using Python slicing conventions.

recursive boolean optional default=False

If true, will recursively descend directories from path, loading all files with an extension matching 'ext'.
recount boolean optional default=False

Force subsequent record counting.

`frombinary(path, shape=None, dtype=None, ext='bin', start=None, stop=None, recursive=False, nplanes=None, npartitions=None, conf='conf.json', order='C', engine=None, credentials=None)`

Load images from flat binary files.

Assumes one image per file, each with the shape and ordering as given by the input arguments.

path str

Path to data files or directory, specified as either a local filesystem path or in a URI-like format, including scheme. May include a single '*' wildcard character.
shape tuple of positive int

Dimensions of input image data.
ext string optional default="bin"

Extension required on data files to be loaded.
start, stop nonnegative int optional default=None

Indices of the first and last-plus-one file to load, relative to the sorted filenames matching path and ext. Interpreted using python slice indexing conventions.
recursive boolean optional default=False

If true, will recursively descend directories from path, loading all files with an extension matching 'ext'.
nplanes positive integer optional default=None

If passed, will cause single files to be subdivided into nplanes separate images. Otherwise, each file is taken to represent one image.
npartitions int optional default=None

Number of partitions for computational engine, if None will use default for engine.

`fromtif(path, ext='tif', start=None, stop=None, recursive=False, nplanes=None, npartitions=None, engine=None, credentials=None)`

Loads images from single or multi-page TIF files.

path str

Path to data files or directory, specified as either a local filesystem path or in a URI-like format, including scheme. May include a single '*' wildcard character.
ext string optional default="tif"

Extension required on data files to be loaded.
start, stop nonnegative int optional default=None

Indices of the first and last-plus-one file to load, relative to the sorted filenames matching 'path' and 'ext'. Interpreted using python slice indexing conventions.
recursive boolean optional default=False

If true, will recursively descend directories from path, loading all files with an extension matching 'ext'.
nplanes positive integer optional default=None

If passed, will cause single files to be subdivided into nplanes separate images. Otherwise, each file is taken to represent one image.
npartitions int optional default=None

Number of partitions for computational engine, if None will use default for engine.

`frompng(path, ext='png', start=None, stop=None, recursive=False, npartitions=None, engine=None, credentials=None)`

Load images from PNG files.

path str

Path to data files or directory, specified as either a local filesystem path or in a URI-like format, including scheme. May include a single '*' wildcard character.
ext string optional default="tif"

Extension required on data files to be loaded.
start, stop nonnegative int optional default=None

Indices of the first and last-plus-one file to load, relative to the sorted filenames matching path and ext. Interpreted using python slice indexing conventions.
recursive boolean optional default=False

If true, will recursively descend directories from path, loading all files with an extension matching 'ext'.
npartitions int optional default=None

Number of partitions for computational engine, if None will use default for engine.

`fromrandom(shape=(10, 50, 50), npartitions=1, seed=42, engine=None)`

Generate random image data.

shape tuple optional default=(10 50 50)

Dimensions of images.
npartitions int optional default=1

Number of partitions.
seed int optional default=42

Random seed.

`fromexample(name=None, engine=None)`

Load example image data.

Data must be downloaded from S3, so this method requires an internet connection.

name str

Name of dataset, if not specified will print options.

series

`fromrdd(rdd, nrecords=None, shape=None, index=None, dtype=None)`

Load Series object from a Spark RDD.

Assumes keys are tuples with increasing and unique indices, and values are 1d ndarrays. Will try to infer properties that are not explicitly provided.

rdd SparkRDD

An RDD containing series data.
shape tuple or array optional default = None

Total shape of data (if provided will avoid check).
nrecords int optional default = None

Number of records (if provided will avoid check).
index array optional default = None

Index for records, if not provided will use (0, 1, ...)

dtype string default = None

Data numerical type (if provided will avoid check)

`fromarray(values, index=None, npartitions=None, engine=None)`

Load Series object from a local numpy array.

Assumes that all but final dimension index the records, and the size of the final dimension is the length of each record, e.g. a (2, 3, 4) array will be treated as 2 x 3 records of size (4,)

values array-like

An array containing the data.
index array optional default = None

Index for records, if not provided will use (0,1,...,N) where N is the length of each record.
npartitions int default = None

Number of partitions for parallelization (Spark only)
engine object default = None

Computational engine (e.g. a SparkContext for Spark)

`fromlist(items, accessor=None, index=None, dtype=None, npartitions=None, engine=None)`

Create a Series object from a list of items and optional accessor function.

Will call accessor function on each item from the list, providing a generic interface for data loading.

items list

A list of items to load.
accessor function optional default = None

A function to apply to each item in the list during loading.
index array optional default = None

Index for records, if not provided will use (0,1,...,N) where N is the length of each record.

dtype string default = None

Data numerical type (if provided will avoid check)

npartitions int default = None

Number of partitions for parallelization (Spark only)
engine object default = None

Computational engine (e.g. a SparkContext for Spark)

`fromtext(path, ext='txt', dtype='float64', skip=0, shape=None, index=None, npartitions=None, engine=None, credentials=None)`

Loads Series data from text files.

Assumes data are formatted as rows, where each record is a row of numbers separated by spaces e.g. 'v v v v v'. You can optionally specify a fixed number of initial items per row to skip / discard.

path string

Directory to load from, can be a URI string with scheme (e.g. "file://", "s3n://", or "gs://"), or a single file, or a directory, or a directory with a single wildcard character.
ext str optional default = 'txt'

File extension.

dtype: dtype or dtype specifier, default 'float64' Numerical type to use for data after converting from text.

skip int optional default = 0

Number of items in each record to skip.
shape tuple or list optional default = None

Shape of data if known, will be inferred otherwise.
index array optional default = None

Index for records, if not provided will use (0, 1, ...)
npartitions int default = None

Number of partitions for parallelization (Spark only)
engine object default = None

Computational engine (e.g. a SparkContext for Spark)
credentials dict default = None

Credentials for remote storage (e.g. S3) in the form {access: ***, secret: ***}

`frombinary(path, ext='bin', conf='conf.json', dtype=None, shape=None, skip=0, index=None, engine=None, credentials=None)`

Load a Series object from flat binary files.

path string URI or local filesystem path

Directory to load from, can be a URI string with scheme (e.g. "file://", "s3n://", or "gs://"), or a single file, or a directory, or a directory with a single wildcard character.
ext str optional default = 'bin'

Optional file extension specifier.
conf str optional default = 'conf.json'

Name of conf file with type and size information.

dtype: dtype or dtype specifier, default 'float64' Numerical type to use for data after converting from text.

shape tuple or list optional default = None

Shape of data if known, will be inferred otherwise.
skip int optional default = 0

Number of items in each record to skip.
index array optional default = None

Index for records, if not provided will use (0, 1, ...)
engine object default = None

Computational engine (e.g. a SparkContext for Spark)
credentials dict default = None

Credentials for remote storage (e.g. S3) in the form {access: ***, secret: ***}

`fromrandom(shape=(100, 10), npartitions=1, seed=42, engine=None)`

Generate gaussian random series data.

shape tuple

Dimensions of data.
npartitions int

Number of partitions with which to distribute data.
seed int

Randomization seed.

`fromexample(name=None, engine=None)`

Load example series data.

Data must be downloaded from S3, so this method requires an internet connection.

name str

Name of dataset, options include 'iris' | 'mouse' | 'fish'. If not specified will print options.

freeman-lab/thunder-api.md

data types

images

count()

first()

toblocks(size='150')

totimeseries(size='150')

toseries(size='150')

tolocal()

tospark(engine=None)

foreach(func)

sample(nsamples=100, seed=None)

map(func, dims=None, with_keys=False)

filter(func)

reduce(func)

mean()

var()

std()

sum()

max()

min()

squeeze()

max_projection(axis=2)

max_min_projection(axis=2)

subsample(factor)

gaussian_filter(sigma=2, order=0)

uniform_filter(size=2)

median_filter(size=2)

localcorr(neighborhood=2)

subtract(val)

topng(path, prefix="image", overwrite=False)

totif(path, prefix="image", overwrite=False)

tobinary(path, prefix="image", overwrite=False)

map_as_series(func, value_size=None, block_size='150')

series

flatten()

count()

first()

tolocal()

tospark(engine=None)

sample(nsamples=100, seed=None)

map(func, index=None, with_keys=False)

filter(func)

reduce(func)

mean()

var()

std()

sum()

max()

min()

between(left, right)

select(crit)

center(axis=1)

standardize(axis=1)

zscore(axis=1)

squelch(threshold)

correlate(signal)

series_max()

series_min()

series_sum()

series_mean()

series_median()

series_percentile(q)

series_std()

series_stat(stat)

series_stats()

mean_by_panel(length)

select_by_index(val, level=0, squeeze=False, filter=False, return_mask=False)

aggregate_by_index(function, level=0)

stat_by_index(stat, level=0)

sum_by_index(level=0)

mean_by_index(level=0)

median_by_index(level=0)

std_by_index(level=0)

max_by_index(level=0)

min_by_index(level=0)

count_by_index(level=0)

cov()

gramian()

times(other)

`count()`

`first()`

`toblocks(size='150')`

`totimeseries(size='150')`

`toseries(size='150')`

`tolocal()`

`tospark(engine=None)`

`foreach(func)`

`sample(nsamples=100, seed=None)`

`map(func, dims=None, with_keys=False)`

`filter(func)`

`reduce(func)`

`mean()`

`var()`

`std()`

`sum()`

`max()`

`min()`

`squeeze()`

`max_projection(axis=2)`

`max_min_projection(axis=2)`

`subsample(factor)`

`gaussian_filter(sigma=2, order=0)`

`uniform_filter(size=2)`

`median_filter(size=2)`

`localcorr(neighborhood=2)`

`subtract(val)`

`topng(path, prefix="image", overwrite=False)`

`totif(path, prefix="image", overwrite=False)`

`tobinary(path, prefix="image", overwrite=False)`

`map_as_series(func, value_size=None, block_size='150')`

`flatten()`

`count()`

`first()`

`tolocal()`

`tospark(engine=None)`

`sample(nsamples=100, seed=None)`

`map(func, index=None, with_keys=False)`

`filter(func)`

`reduce(func)`

`mean()`

`var()`

`std()`

`sum()`

`max()`

`min()`

`between(left, right)`

`select(crit)`

`center(axis=1)`

`standardize(axis=1)`

`zscore(axis=1)`

`squelch(threshold)`

`correlate(signal)`

`series_max()`

`series_min()`

`series_sum()`

`series_mean()`

`series_median()`

`series_percentile(q)`

`series_std()`

`series_stat(stat)`

`series_stats()`

`mean_by_panel(length)`

`select_by_index(val, level=0, squeeze=False, filter=False, return_mask=False)`

`aggregate_by_index(function, level=0)`

`stat_by_index(stat, level=0)`

`sum_by_index(level=0)`

`mean_by_index(level=0)`

`median_by_index(level=0)`

`std_by_index(level=0)`

`max_by_index(level=0)`

`min_by_index(level=0)`

`count_by_index(level=0)`

`cov()`

`gramian()`

`times(other)`

`totimeseries()`

`toimages(size='150')`

`tobinary(path, prefix='series', overwrite=False, credentials=None)`

`fromrdd(rdd, dims=None, nrecords=None, dtype=None)`