Implements Bucket interface for T4.
Shorthand for deserialize(key)
Bucket.config(config_url='https://t4.quiltdata.com/config.json', quiet=False)
Updates this bucket’s search endpoint based on a federation config.
Deletes a key from the bucket.
Parameters: key (str) – key to delete
Returns: None
Raises: if delete fails
Deserializes object at key from bucket.
Parameters: key (str) – key in bucket to get
Returns: deserialized object
Raises:
- KeyError if key does not exist
- if deserialization fails
Fetches file (or files) at key to path.
If key ends in ‘/’, then all files with the prefix key will match and will: be stored in a directory at path.
Otherwise, only one file will be fetched and it will be stored at path.
Parameters:
- key (str) – key in bucket to fetch
- path (str) – path in local filesystem to store file or files fetched
Returns: None
Raises:
- if path doesn’t exist
- if download fails
Gets the metadata associated with a key in bucket.
Parameters: key (str) – key in bucket to get meta for
Returns: dict of meta
Raises: if download fails
Lists all keys in the bucket.
Returns: list of strings
Stores obj at key in bucket, optionally with user-provided metadata.
Parameters:
- key (str) – key in bucket to put object to
- obj (serializable) – serializable object to store at key
- meta (dict) – optional user-provided metadata to store
Stores all files under directory under the prefix key.
Parameters:
- key (str) – prefix to store files under in bucket
- directory (str) – path to local directory to grab files from
Returns: None
Raises:
- if directory isn’t a valid local directory
- if writing to bucket fails
Stores file at path to key in bucket.
Parameters:
- key (str) – key in bucket to store file at
- path (str) – string representing local path to file
Returns: None
Raises:
- if no file exists at path
- if copy fails
Execute a search against the configured search endpoint.
query: query string to search
Returns either the request object (in case of an error) or: > a list of objects with the following keys:
key: key of the object
version_id: version_id of object version
operation: Create or Delete
meta: metadata attached to object
size: size of object in bytes
text: indexed text of object
source: source document for object (what is actually stored in ElasticSeach)
time: timestamp for operation
Selects data from an S3 object.
Parameters:
- key (str) – key to query in bucket
- query (str) – query to execute (SQL by default)
- query_type (str) – other query type accepted by S3 service
- raw (bool) – return the raw (but parsed) response
Returns: pandas.DataFrame with results of query
Sets user metadata on key in bucket.
Parameters:
- key (str) – key in bucket to set meta for
- meta (dict) – value to set user metadata to
Returns: None
Raises: if put to bucket fails
In-memory representation of a package
Checks whether the package contains a specified logical_key.
Returns: True or False
Filters the package based on prefix, and returns either a new Package: or a PackageEntry.
Parameters: prefix (str) – prefix to filter on
Returns:
PackageEntry if prefix matches a logical_key exactly otherwise Package
String representation of the Package.
Load a package into memory from a registry without making a local copy of the manifest.
Parameters:
- name (string) – name of package to load
- registry (string) – location of registry to load package from
- pkg_hash (string) – top hash of package version to load
Serializes this package to a registry.
Parameters:
- name – optional name for package
- registry – registry to build to defaults to local registry
- message – the commit message of the package
Returns: the top hash as a string
Returns the package with logical_key removed.
Returns: self
Raises: KeyError
– when logical_key is not present to be deleted
Returns three lists – added, modified, deleted.
Added: present in other_pkg but not in self. Modified: present in both, but different. Deleted: present in self, but not other_pkg.
Parameters: other_pkg – Package to diff
Returns: added, modified, deleted (all lists of logical keys)
Serializes this package to a writable file-like object.
Parameters:
writable_file – file-like object to write serialized package.
Returns: None
Raises:
- fail to create file
- fail to finish write
Copy all descendants to dest. Descendants are written under their logical names relative to self. So if p[a] has two children, p[a][b] and p[a][c], then p[a].fetch(“mydir”) will produce the following:
mydir/: b c
Parameters: dest – where to put the files (locally)
Returns: None
Gets object from local_key and returns its physical path. Equivalent to self[logical_key].get().
Parameters: logical_key (string) – logical key of the object to get
Returns: Physical path as a string.
Raises:
KeyError
– when logical_key is not present in the packageValueError
– if the logical_key points to a Package rather than PackageEntry.
Returns user metadata for this Package.
Installs a named package to the local registry and downloads its files.
Parameters:
- name (str) – Name of package to install.
- registry (str) – Registry where package is located.
- pkg_hash (str) – Hash of package to install. Defaults to latest.
- dest (str) – Local path to download files to.
- dest_registry (str) – Registry to install package to. Defaults to local registry.
Returns: A new Package that points to files on your local machine.
Returns logical keys in the package.
Loads a package from a readable file-like object.
Parameters:
readable_file – readable file-like object to deserialize package from
Returns: a new Package object
Raises:
- file not found
- json decode error
- invalid package exception
Returns a generator of the dicts that make up the serialied package.
Copies objects to path, then creates a new package that points to those objects. Copies each object in this package to path according to logical key structure, then adds to the registry a serialized version of this package with physical_keys that point to the new copies. :param name: name for package in registry :param dest: where to copy the objects in the package :param registry: registry where to create the new package :param message: the commit message for the new package
Returns: A new package that points to the copied objects
Returns self with the object at logical_key set to entry.
Parameters:
- logical_key (string) – logical key to update
- entry (PackageEntry OR string) – new entry to place at logical_key in the package if entry is a string, it is treated as a URL, and an entry is created based on it
- meta (dict) – user level metadata dict to attach to entry
Returns: self
Adds all files from path to the package.
Recursively enumerates every file in path, and adds them to: the package according to their relative location to path.
Parameters:
- lkey (string) – prefix to add to every logical key, use ‘/’ for the root of the package.
- path (string) – path to scan for files to add to package.
Returns: self
Raises: when path doesn’t exist
Sets user metadata on this Package.
Returns the top hash of the package.
Note that physical keys are not hashed because the package has: the same semantics regardless of where the bytes come from.
Returns: A string that represents the top hash of the package
Updates the package with the keys and values in new_keys_dict.
If a metadata dict is provided, it is attached to and overwrites metadata for all entries in new_keys_dict.
Parameters:
- new_dict (dict) – dict of logical keys to update.
- meta (dict) – metadata dict to attach to every input entry.
- prefix (string) – a prefix string to prepend to every logical key.
Returns: self
Verify that a package name is two alphanumerics strings separated by a slash.
Generator that traverses all entries in the package tree and returns tuples of (key, entry), with keys in alphabetical order.
Set or read the T4 configuration
To retrieve the current config, call directly, without arguments::
python >>> import t4 as he >>> he.config()
To trigger autoconfiguration, call with just the navigator URL::
python >>> he.config('https://example.com')
To set config values, call with one or more key=value pairs::
python >>> he.config(navigator_url='http://example.com', ... elastic_search_url='http://example.com/queries')
When setting config values, unrecognized values are rejected. Acceptable config values can be found in t4.util.CONFIG_TEMPLATE
Parameters:
- autoconfig_url – URL indicating a location to configure from
- **config_values – key=value pairs to set in the config
Returns: HeliumConfig object (an ordered Mapping)
Copies src
object from T4 to dest
Either of src
and dest
may be S3 paths (starting with s3://
)
or local file paths (starting with file:///
).
Parameters:
- src (str) – a path to retrieve
- dest (str) – a path to write to
Delete an object.
Parameters: target (str) – URI of the object to delete
Delete a package. Deletes only the manifest entries and not the underlying files.
Parameters:
- name (str) – Name of the package
- registry (str) – The registry the package will be removed from
Retrieves src object from T4 and loads it into memory.
An optional version
may be specified.
Parameters: src (str) – A URI specifying the object to retrieve
Returns: (data, metadata)
. Does not work on all objects.
Return type: tuple
Lists Packages in the registry.
Returns a list of all named packages in a registry. If the registry is None, default to the local registry.
Parameters:
registry (string) – location of registry to load package from.
Returns: A list of strings containing the names of the packages
List data from the specified path.
Parameters:
- target (str) – URI to list
- recursive (bool) – show subdirectories and their contents as well
Returns:
Return value structure has not yet been permanently decided
Currently, it’s a tuple
of list
objects, containing the
following:
result[0]
directory info
result[1]: file/object info
result[2]: delete markers
Return type: list
Write an in-memory object to the specified T4 dest
You may pass a dict to meta
to store it with obj
at dest
.
See User Docs for more info on object Serialization and Metadata.
Parameters:
- obj – a serializable object
- dest (str) – A URI
- meta (dict) – Optional. metadata dict to store with
obj
atdest
Searches your bucket. query can contain plaintext, and can also contain clauses like $key:”$value” that search for exact matches on specific keys.
Returns either the request object (in case of an error) or a list of objects with the following keys:: key: key of the object version_id: version_id of object version operation: Create or Delete meta: metadata attached to object size: size of object in bytes text: indexed text of object source: source document for object (what is actually stored in ElasticSeach) time: timestamp for operation