Q: Shouldn't we name it like Cloud Agnostic Object Storage Interface
The Object Storage is one of the fundamental services provided by the cloud provides. The objects are generally stored in the tree directory structure similary like on the file system. The objects are consumed either by API or in most cases by their unique HTTP URL or by cloud specific internal URI (See Appendix 1).
The initiative addresses a growing need to support the hybrid cloud applications by allowing the developers to transparently access and manipulate the objects from within applications running in various cloud environments.
The goal of the object storage abstraction layer is to create an object storage interface that is independent on the object storage provider. The abstraction layer hides the provider's object storage specifics and allows the developer to interact with the object storage no matter the subject of interaction is any of the supported object storage services (AWS S3, Google Cloud Storage, Azure Blob Storage, Oracle Cloud Object Storage) or local storage.
The general use cases related to the manipulation with the objects on objects storage.
Upload objects to the object storage.
Example:
# https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
aws s3 cp micronaut-buffer-netty-3.1.1.pom s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom
# https://docs.microsoft.com/en-us/cli/azure/storage/blob?view=azure-cli-latest#az_storage_blob_upload
az storage blob upload --container-name micronaut-container --file micronaut-buffer-netty-3.1.1.pom --sas ...
# https://cloud.google.com/storage/docs/uploading-objects
gsutil cp OBJECT_LOCATION gs://DESTINATION_BUCKET_NAME/
# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/put.html
oci os object put --bucket-name $bucket_name --file $file
Receive object from object storage.
Example:
# https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
aws s3 cp s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom micronaut-buffer-netty-3.1.1.pom
# https://docs.microsoft.com/en-us/cli/azure/storage/blob?view=azure-cli-latest#az_storage_blob_download
az storage blob download --container-name <The container name> --file <Path of file to write out to> --name <The blob name>
# https://cloud.google.com/storage/docs/downloading-objects#gsutil
gsutil cp gs://BUCKET_NAME/OBJECT_NAME SAVE_TO_LOCATION
# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/get.html
oci os object get --bucket-name $bucket_name --file $file --name $name
Update already existing object on object storage. Based on the object storage policy, this operation may fail.
Example:
# https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
aws s3 cp micronaut-buffer-netty-3.1.1.pom s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom
# https://docs.microsoft.com/en-us/cli/azure/storage/blob?view=azure-cli-latest#az_storage_blob_upload
az storage blob upload --container-name micronaut-container --file micronaut-buffer-netty-3.1.1.pom --sas $sas
# https://cloud.google.com/storage/docs/uploading-objects
gsutil cp OBJECT_LOCATION gs://DESTINATION_BUCKET_NAME/
# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/put.html
oci os object put --bucket-name $bucket_name --file $file
Copy objects in the scope of one object storage or in between the object storages.
Example:
# https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
aws s3 cp s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom s3://micronaut-object-storage/io/micronaut-buffer-netty-3.1.1.pom
# https://docs.microsoft.com/en-us/cli/azure/storage/blob/copy?view=azure-cli-latest#az_storage_blob_copy_start
az storage blob copy start --account-name MyAccount --destination-blob MyDestinationBlob --destination-container MyDestinationContainer --sas-token $sas --source-uri https://storage.blob.core.windows.net/photos
# https://cloud.google.com/storage/docs/copying-renaming-moving-objects
gsutil cp gs://SOURCE_BUCKET_NAME/SOURCE_OBJECT_NAME gs://DESTINATION_BUCKET_NAME/NAME_OF_COPY
# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/copy.html
oci os object copy --bucket-name $bucket_name --destination-bucket $destination_bucket --source-object-name $source_object_name
Delete object from object storage. Based on the object storage policy, this operation may fail.
Example:
# https://docs.aws.amazon.com/cli/latest/reference/s3/rm.html
aws s3 rm s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom
# https://docs.microsoft.com/en-us/cli/azure/storage/blob?view=azure-cli-latest#az_storage_blob_delete
az storage blob delete -c micronaut-container -n MyBlob --account-name mystorageaccount
# https://cloud.google.com/storage/docs/deleting-objects
gsutil rm gs://BUCKET_NAME/OBJECT_NAME
# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/delete.html
oci os object delete --bucket-name $bucket_name --object-name $object_name
List objects from object storage.
Example:
# https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html
aws s3 ls s3://micronaut-object-storage
# https://docs.microsoft.com/en-us/cli/azure/storage/blob/directory?view=azure-cli-latest#az_storage_blob_directory_list
az storage blob directory list -c MyContainer -d DestinationDirectoryPath --account-name MyStorageAccount
# https://cloud.google.com/storage/docs/listing-objects
gsutil ls -r gs://BUCKET_NAME/**
# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/list.html
oci os object list --bucket-name $bucket_name
Sync directories in between object storages or with local directory.
Example:
# https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
aws s3 sync local-dir s3://micronaut-object-storage
# https://docs.microsoft.com/en-us/cli/azure/storage/blob?view=azure-cli-latest#az_storage_blob_sync
az storage blob sync -c mycontainer --account-name mystorageccount --account-key 00000000 -s "path/to/directory"
# https://cloud.google.com/storage/docs/gsutil/commands/rsync
gsutil rsync data gs://mybucket/data
# https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/sync.html
oci os object sync --bn backup --src-dir .
The design consists from the 3 parts:
- The low level api - the core of abstraction layer
- The injection - how to work with the abstraction layer
- The extensions - introduces various extensions to ease the interaction with object storage abstraction layer
This section describes the main building blocks of the Object Storage abstraction layer.
Every object is uniquely identified by its path within the bucket. Additionally the objects have metadata associated with them. The metadata represents various object properties (like content type, tags, cache control, ..) and are provided when creating or updating the object on the object storage. See Appendix 3 for the object metadata list that are generally provided by cloud providers.
For the sake of the object-oriented approach, the object on object storage has its own object representation:
interface ObjectStorageEntry {
/**
* The object name. For example {@code picture.jpg}
* @return object name
*/
String getName();
/**
* The object path on object storage. For example {@code /path/to}
*
* @return object path or empty string if the object is placed at the root of bucket
*/
String getPath();
/**
* The object absolute path. For example {@code /path/to/picture.jpg}
* @return absolute path
*/
String getAbsolutePath();
/**
* The object metadata.
*
* @return map of object metadata
*/
Map<ObjectMeta, String> getMetadata();
/**
* The object content.
*
* @return object content.
*/
InputStream getInputStream();
}
Note: In general the ObjectStorage is understood rather as the service than the storage itself. Every object storage service has some internal architecture. The most known and used name for the place where are the objects stored is the bucket. The bucket is used by AWS, GoogleCloud, OracleCloud. However, the inclusion of the bucket into the internal architecture differs. For AWS,GCloud the name of the bucket is unique globally. For Oracle Cloud the uniqueness is in scope of the namespace (tenant). On the opposite the Azure uses container instead of bucket and the uniquness is in scope of the storage account. Because of that the noun ObjectStorage is IMHO the best representation of the main interaction with the service itself as it abstracts all the cloud provider's specifics and yet is not related to any cloud provider nomenclature.
The ObjectStorage
interface provides the common operations with objects on the logical storage. Such abstraction allows to have specific ObjectStorage
configurations like authentication, ACL policies, tags or hooks with respect to the implemented adapter for the given provider. For example if the application has two object storages configured, then there are two ObjectStorage
beans created.
The location of the object on the logical storage is represented as the file path in directory structure, without containing any object storage provider specifics. For example the object on AWS S3 represented as S3 URI s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom
is represented as micronaut-buffer-netty-3.1.1.pom
. Such approach allows the developer to independently work with objects leaving the object storage specifics be handled by the respective implementation of the ObjectStorage
interface.
The API of ObjectStorage
mixes Java native types methods with the object-oriented approach. The Java native types methods are for quick and easy interaction.
interface ObjectStorage {
/**
* Upload object to the object storage.
* @param objectPath the object path
* @param inputStream the object content
* @param metadata the object metadata
* @throws ObjectStorageException if there was a failure to store object
*/
void put(String objectPath, InputStream inputStream, Map<ObjectMeta, String> metadata) throws ObjectStorageException;
/**
* Upload object to object storage.
*
* @param object the object
* @throws ObjectStorageException
*/
void put(ObjectStorageObject object) throws ObjectStorageException;
/**
* Get the object from object storage.
*
* @param objectPath the object path
* @return the object content as input stream or null if object not exists
* @throws ObjectStorageException if there was a failure to store object
*/
InputStream get(String objectPath) throws ObjectStorageException;
/**
* Get the object from object storage.
*
* @param objectPath the object path
* @return the object or null if object not exists
* @throws ObjectStorageException if there was a failure to store object
*/
ObjectStorageObject get(String objectPath) throws ObjectStorageException;
/**
* Copy the object in scope of the object storage.
*
* @param objectSourcePath object source path
* @param objectTargetPath object target path
*/
void copy(String objectSourcePath, String objectTargetPath) throws ObjectStorageException;
/**
* Delete the object.
* @param objectName object name in format {@code /foo/bar/file}
*/
void delete(String objectName) throws ObjectStorageException;
/**
* List objects filtered to path. The list contains the objects.
*
* @implNote the implementation uses paging if possible
* @apiNote this call may lead to 1 + N requests.
*/
Iterable<ObjectStorageObject> list(String path) throws ObjectStorageException;
/**
* List object absolute paths filtered to {@code path}. The list contains the files in format {@code /foo/bar/file}
*
* @implNote the implementation uses paging if possible
*/
Iterable<String> list(String path) throws ObjectStorageException;
/**
* Sync the objects from {@code sourcePath} to {@code targetPath}. The path can be local path or path on object storage. The local path string contains prefix {@code file://}.
* @param sourcePath the source path
* @param targetPath the target path
* @param recursive whether to recursively iterate over subdirectories in sourcePath
*/
void sync(String sourcePath, String targetPath, boolean recursive) throws ObjectStorageException;
}
Since Micronaut is primarily asynchronously oriented framework, here is the reactive version:
interface ObjectStorageReactive {
/**
* Upload object to the object storage.
* @param objectPath the object path
* @param inputStream the object content
* @param metadata the object metadata
* @throws ObjectStorageException if there was a failure to store object
*/
Publisher<Boolean> put(String objectPath, InputStream inputStream, Map<ObjectMeta, String> metadata) throws ObjectStorageException;
/**
* Upload object to object storage.
*
* @param object the object
* @throws ObjectStorageException
*/
Publisher<Boolean> put(ObjectStorageObject object) throws ObjectStorageException;
/**
* Get the object from object storage.
*
* @param objectPath the object path
* @return the object content as input stream or null if object not exists
* @throws ObjectStorageException if there was a failure to store object
*/
Publisher<InputStream> get(String objectPath) throws ObjectStorageException;
/**
* Get the object from object storage.
*
* @param objectPath the object path
* @return the object or null if object not exists
* @throws ObjectStorageException if there was a failure to store object
*/
Publisher<ObjectStorageObject> get(String objectPath) throws ObjectStorageException;
/**
* Copy the object in scope of the object storage.
*
* @param objectSourcePath object source path
* @param objectTargetPath object target path
*/
Publisher<Boolean> copy(String objectSourcePath, String objectTargetPath) throws ObjectStorageException;
/**
* Delete the object.
* @param objectName object name in format {@code /foo/bar/file}
*/
Publisher<Boolean> delete(String objectName) throws ObjectStorageException;
/**
* List objects filtered to path. The list contains the objects.
*
* @implNote the implementation uses paging if possible
* @apiNote this call may lead to 1 + N requests.
*/
Publisher<ObjectStorageObject> list(String path) throws ObjectStorageException;
/**
* List object absolute paths filtered to {@code path}. The list contains the files in format {@code /foo/bar/file}
*
* @implNote the implementation uses paging if possible
*/
Publisher<String> list(String path) throws ObjectStorageException;
/**
* Sync the objects from {@code sourcePath} to {@code targetPath}. The path can be local path or path on object storage. The local path string contains prefix {@code file://}.
* @param sourcePath the source path
* @param targetPath the target path
* @param recursive whether to recursively iterate over subdirectories in sourcePath
*/
Publisher<Boolean> sync(String sourcePath, String targetPath, boolean recursive) throws ObjectStorageException;
}
Q: Since the object locator is a path without object storage specific, using the String as locator was a first call. However, the java.nio.file.Path could be a more suited option. This also means the ObjectStorageObject#getPath
and ObjectStorageObject#getAbsolutePath
would reflect that. See extensions where this is discussed more
ObjectStorage objectStorage = ...;
List<String> objects = objectStorage.list("/public/www");
if(objects.get("/public/www/icon.png") != null){
InputStream is = new FileInputStream(new File("src/main/resources/sample.txt"));
objectStorage.put("/public/www/icon.png", is);
}
The common configuration interface contains just name of the object storage only, e.g. for s3://micronaut-object-storage/
it is micronaut-object-storage
. The rest of the properties is cloud provider specific:
public interface ObjectStorageConfiguration {
/**
* The name of the object storage.
* @return object storage name
*/
String getName();
}
The "DSL" of configuration:
object-storage:
<provider-name>:
<object-storage-bean-name/object-storage-name>:
<implementation details configuration>
Example for hybrid cloud application:
Note the same name of the object storage.
application-ec2.yml
object-storage:
aws:
public-images:
access-key-id: xxx
secret-access-key: xxx
application-oraclecloud.yml
:
object-storage:
oracle-cloud:
public-images:
application-azure.yml
:
object-storage:
azure:
public-images:
storage-account: xxx
application-gcp.yml
:
object-storage:
gcp:
public-images:
Example for using more cloud providers:
application.yml
:
object-storage:
oracle-cloud:
public-images-on-aws:
bucket-name: public-images
access-key-id: xxx
secret-access-key: xxx
aws:
public-images-on-oracle-cloud:
bucket-name: public-images
azure:
public-images-on-azure:
container-name: public-images
storage-account: xxx
gcp:
public-images-on-gcp:
bucket-name: public-images
The configuration can be merged into flat structure, having the object storage implementation be driven by mandatory field provider
.
The "DSL" of configuration:
object-storage:
<object-storage-name>:
provider: [azure|gcp|aws|oracle-cloud]
<implementation details configuration>
Example for using more cloud providers:
object-storage:
public-images-on-aws:
bucket-name: public-images
access-key-id: xxx
secret-access-key: xxx
provider: oracle-cloud
public-images-on-oracle-cloud:
bucket-name: public-images
provider: aws
public-images-on-azure:
container-name: public-images
storage-account: xxx
provider: azure
public-images-on-gcp:
bucket-name: public-images
provider: gcp
The ObjectStorage qualifier is evaluated from the ObjectStorageConfiguration#getName
property. The property name
is derived from the configuration (in this order):
- the name,
object-storage: gcp: public-images-on-gcp: name: micronaut-object-storage
- the bean qualifier
micronaut-object-storage
object-storage: gcp: micronaut-object-storage:
The reason for having this way of qualifier evaluation lies within the hybrid cloud use case when it is not possible to have unified name of the bucket across the cloud providers. This is a case for AWS S3 and Google Cloud Object Storage where the bucket are globally uniquely identified.
For the configuration:
object-storage:
gcp:
micronaut-object-storage:
the injection using the the @Named
annotation looks like for the configuration:
import jakarta.inject.Named;
public class ImageService {
public ImageService(@Named("micronaut-object-storage") ObjectStorage objectStorage){
//..
}
}
Allows to create ObjectStorage
beans using qualifiers if there's either
- configured cloud provider SDK like
- the configuration can be automatically deduced (
~/.aws/,
~/.oci/`).
Then for example if OCI sdk is present and the credentials were automatically deduced:
import jakarta.inject.Named;
public class ImageService {
public ImageService(@Named("micronaut-object-storage") ObjectStorage objectStorage){
//..
}
}
Will cause the ObjectStorage
for Oracle Cloud Object Storage
for bucket micronaut-object-storage
using the namespace
and region
evaluated from the OCI sdk are used.
The advantage is there's no need to configure the object storage in application.yml
.
Note that this is possible to do only in case there's one cloud provider ObjectStorage library presented on classpath. In case there would be two supported ObjectStorage implementation, the internals wouldn't have a way how to find out what cloud provider to use.
Extension 2: ObjectStorage beans based on configured SDK authentication using specialised bean qualifiers
This allows to leverage the automatic SDK evaluation even when there are more ObjectStorage implementations by using specialised qualifiers like: @AwsObjectStorage
, @OciObjectStorage
Then:
import jakarta.inject.Named;
public class ImageService {
public ImageService(
@AwsObjectStorage("micronaut-object-storage") ObjectStorage objectStorage,
@OciObjectStorage("micronaut-object-storage") ObjectStorage objectStorage
){
//...
}
}
Will cause the ObjectStorage
annotated by @OciObjectStorage
will create for Oracle Cloud Object Storage
for bucket micronaut-object-storage
using the namespace
and region
evaluated from the OCI sdk are used. Similarly for the @AwsObjectStorage
.
Note that by using the cloud provider specific annotation the cloud agnostic approach is broken.
Implements the ResourceLoader in order to get the objects using shorter URI in common format:
<object-storage-name>://path/to/file
- is the
ObjectStorage#getName
Then for configuration:
object-storage:
aws:
public-images:
access-key-id: xxx
secret-access-key: xxx
The locator would be: public-images://path/to/file
Implements the ResourceLoader in order to get the objects using shorter URI in common format:
[s3|os|gs|azb]:([cloud-provider-specifics]:)*//<storage-name>/path/to/file
Where for the cloud providers:
- AWS
- format:
s3://<bucket-name>/path/to/file
- example:
s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom
- format:
- Azure
- format:
azb:<storage-account-name>://<container>/path/to/file
- example:
azb:micronautpgressatest://micronaut-object-storate/micronaut-buffer-netty-3.1.1.pom
- format:
- Google Cloud
- format:
gs://<bucket-name>/path/to/file
- example:
sgs//micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom
- format:
- Oracle Cloud
- format:
os:<region>:<namespace>://<bucket-name>/path/to/file
- example:
os:us-ashburn-1:cloudnative-devrel://micronaut-object-storate/micronaut-buffer-netty-3.1.1.pom
- format:
Idea is to implement cloud agnostic StreamingFileUpload where when transferTo(String)
method would be used using cloud agnostic locator:
<object-storage-name>://path/to/file
public Publisher<HttpResponse<String>> upload(ObjectStorageStreamingFileUpload upload){
Publisher<Boolean> uploadPublisher = file.transferTo("public-image://www/uploads/" + upload.getFilename())
return Mono.from(uploadPublisher)
.map(success -> {
if (success) {
return HttpResponse.ok("Uploaded");
} else {
return HttpResponse.<String>status(CONFLICT)
.body("Upload Failed");
}
});
}
The same like above but instead of using micronaut object storage, the cloud specific URI woudl be used.
The idea is to implement the https://docs.oracle.com/javase/7/docs/api/java/nio/file/FileSystem.html for given cloud providers leveriging already existing beans etc.
There are projects that implement to some extend the java.nio:
- google cloud - https://github.com/googleapis/java-storage-nio
- azure - https://devblogs.microsoft.com/azure-sdk/java-nio-filesystem-apis-and-the-new-azure-sdks/
- aws s3 - aws/aws-sdk-java-v2#1388 where the https://github.com/Upplication/Amazon-S3-FileSystem-NIO2 is mentioned and it's sping off https://github.com/nextflow-io/nextflow-s3fs, there's also some other project https://github.com/carlspring/s3fs-nio/
- oracle cloud - N/A
Localator | Format | Example |
---|---|---|
Object URL | https://<bucket-name>.s3.<region>.amazonaws.com/<object-name> |
https://micronaut-object-storage.s3.eu-west-1.amazonaws.com/micronaut-buffer-netty-3.1.1.pom |
S3 URI | s3://<bucket-name>/<object-name> |
s3://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom |
ARN | arn:aws:s3:::<bucket-name>/<object-name> |
arn:aws:s3:::micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom |
Localator | Format | Example |
---|---|---|
Object URL | https://<storage-account-name>.blob.core.windows.net/<container>/<blob-name> |
https://micronautpgressatest.blob.core.windows.net/micronaut-object-storate/micronaut-buffer-netty-3.1.1.pom |
Localator | Format | Example |
---|---|---|
Object URL | https://storage.cloud.google.com/<bucket-name>/<object-name> |
https://storage.cloud.google.com/micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom |
gsutil URI | gs://<bucket-name>/<object-name> |
gs://micronaut-object-storage/micronaut-buffer-netty-3.1.1.pom |
Localator | Format | Example |
---|---|---|
Object URL | https://objectstorage.<region>.oraclecloud.com/n/<namespace>/b/<bucket-name>/o/<object-name> |
https://objectstorage.us-ashburn-1.oraclecloud.com/n/cloudnative-devrel/b/micronaut-object-storage/o/micronaut-buffer-netty-3.1.1.pom |
Even though the object storage use case is the file manipulation, the internal complexity varies among the cloud provides. For example the logical nesting of an object in the service:
Storage account -> Container -> <Object>
Bucket -> <Object>
Bucket -> <Object>
Namespace (Tenant) -> Bucket -> <Object>
Name | Description |
---|---|
cache-control |
Specifies caching behavior along the request/reply chain. |
content-type |
Specify an explicit content type for this operation. This value overrides any guessed mime types. |
content-language |
The language the content is in. |
content-encoding |
Specifies what content encodings have been applied to the object and thus what decoding mechanisms must be applied to obtain the media-type referenced by the Content-Type header field. |
meta |
A map of metadata to store with the objects |
expires |
The date and time at which the object is no longer cacheable. |
- https://cloud.google.com/storage/docs/metadata
- https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
- https://docs.microsoft.com/en-us/cli/azure/storage/blob?view=azure-cli-latest#az_storage_blob_upload
- https://docs.oracle.com/en-us/iaas/tools/oci-cli/3.2.0/oci_cli_docs/cmdref/os/object/put.html