Skip to content

Instantly share code, notes, and snippets.

@icewind1991
Last active November 6, 2015 12:36
Show Gist options
  • Save icewind1991/7ae6885c45f60d9e27af to your computer and use it in GitHub Desktop.
Save icewind1991/7ae6885c45f60d9e27af to your computer and use it in GitHub Desktop.

Storage API

The storage API consists of 2 seperate (sets of) interfaces:

  • Storage Implementation Interface: it is the main interface to implement to create a custom storage backend.
  • Storage Adapter Interface: it is the interface that upper layers like end user application can implement to access the storage backend.

3 supported storage types

  1. Storage backend fully handles all meta data (CERN EOS)
  2. Storage backend provides meta data which is cached (Local Storage, most external)
  3. Storage backend doesn't provide any meta data, only blob storage (Object Storage like S3)

Storage Implementation Interfaces

The responsabilities of this interface are splitted over 5 interfaces to allow reuse from storage to storage implementations.

Splitting the interface makes it possible to seperate the various parts of the storage interface and allow an implementation to decide to either reuse parts (such as meta data storage in the db) or implement their own (read metadata from a custom backend).

Storage\Data

Responsible for storing file data, no knowledge about any meta data besides file path

  • readStream(string $path): resource
  • writeStream(string $path, resource $data)
  • delete(string $path)

Storage\Tree

Handles directories and file tree operations (list content, rename)

  • exists(string $path): bool
  • newFolder(string $path): string[]
  • deleteFolder(string $path)
  • listFolderContents(string $path)
  • move(string $source, string $target)

Storage\MetaRead

Provides read access to metadata

  • getMeta(string $path): MetaData
  • getFolderContentsMeta(string $path): MetaData[]

Storage\MetaWrite

Provides write access to metadata

  • setMeta(string $path, array $data)
  • move(string $souce, string $target)
  • remove(string $path)

Storage\MetaTree

  • getMetaById(int $id): MetaData
  • getFolderContentsMetaById(int $id): MetaData[]
  • getParentsById(int $id): MetaData[]
  • traverse(string $path): Traversable<MetaData>

Storage Adapter

The storage adapter takes care of hiding the difference in storage implementation types from the user of the storage interface

The adapter takes one or more classes which implement the various implementation interfaces.

Different implementations of the storage adapter can be used to add functionality such as metadata caching or spreading data over multiple data stores.

Adapter\Adapter

Adapator for storage implementation which fully manage their own metadata

Requires one Data, Tree, MetaRead and a MetaTree instance

  • readStream(string $path): resource
  • writeStream(string $path, resource $): resource
  • newFolder(string $path)
  • delete(string $path)
  • rename(string $source, string $path)
  • exists(string $path)
  • getMeta(string $path): MetaData
  • getMetaById(int $id): MetaData
  • getFolderContents(string $path): MetaData[]
  • getFolderContentsById(int $id): MetaData[]
  • traverse(string $path): Traversable<MetaData>
  • getParentsById(int $id): MetaData[]

Adapter\UpdateMeta extends Adapter\Adapter

Adapter for storage implementation where we need to update the metadata manually after write operations

Requires one Data, Tree, MetaRead, MetaWrite and a MetaTree instance

Adapter\Caching extends Adapter\UpdateMeta

Adapter for storage implementation where meta data should be cached.

Requires two MetaRead and Tree instances, one Data, MetaTree and a MetaWrite instance

Adapter\FullMeta extends Adapter\Adapter

Adapter for storage implementation where we need to manage all metadata manually

Requires one Data, Tree, MetaRead and a MetaWrite instance

Scanner

Takes one Tree and MetaRead instance as source and syncronizes it with a MetaRead and MetaWrite instance

Example cases

Local (and most external storages)

Local implements Data, Tree and MetaRead where all meta data is read from the underlying filesystem DBCache implements Tree, MetaRead, MetaWrite and MetaTree with all data stored in the database

Adapter\Caching reads it's meta data from the DBCache, updates the DBCache when needed and reads and writes the files from Local

ObjectStore

ObjectStore implements Data and only reads and writes blobs from the objectstore DBCache implements Tree, MetaRead, MetaWrite and MetaTree with all data stored in the database

Adapter\FullMeta handles maintaining all metadata in the DBCache

EOS (cern's storage implementation with full metadata)

EOS implements Data, Tree, MetaRead, MetaWrite and MetaTree and handles all metadata operations itself

Adater\Adapter only has to pass all operations down to EOS

@icewind1991
Copy link
Author

How about the cross-storage move/copy which we had in the old storage implementations ?

Will probably makes most sense to handle that in the adapter level, the adapter can transform the moveFromStorage into a regular move for the implementation.

Will the SharedStorage eventually also be implementing these for shared folders as recipient ?

Yes

how do these interfaces and classes interact with the existing storage interfaces we have?

Ideally the View/Node will be adjusted to use this api, the old storage classes could be re-used by adding an adapter for that to save us from having to re-implement every storage for now.

Do we want/need a concept of extended metadata, like for example tags

Maybe have a plugin system for adapters which can add additional metadata

One idea would be to do like Webdav and provide an optional list of properties to be requested\

Sounds good

will there also be storage wrappers that implement this interface

Yes, on the adapter level probably makes most sense

@icewind1991
Copy link
Author

From https://gist.github.com/labkode/a84a8f66920a6cb9355c:

getParentsById(int $id): MetaData[] WHY IS THIS NEEDED ? IT LOOKS LIKE A SQL METADATA STORE IMPLEMENTATION DETAIL

There are several placing in oC here we need the metadata of all parent folders

WARNING: the resource object SHOULD NOT be the resource obtained when doing a local open. It SHOULD be an abstract class that represents the operations that can be made on a resource independently of the storage backend. This is needed to avoid passing storage implementation objects to upper layers trough the storage interface like it is now with OC.

Not sure what you mean with "This is needed to avoid passing storage implementation objects to upper layers"

@schiessle
Copy link

will there also be storage wrappers that implement this interface

Yes, on the adapter level probably makes most sense

Will this work with existing storage wrappers like the quota wrapper, trashbin, encryption,... Or do we need to adjust them? Who will take care of it if we need to adjust all existing storage wrapper (hope this will not be necessary)?

@schiessle
Copy link

we need to consider further extention point to these interfaces (e.g. sharing as currently being discussed with @schiesbn and @ruulzer and tagging)

cc @rullzer maybe you also want to keep an eye on this discussion to make sure all of this will fit together with sharing 2.0.

@icewind1991 with respect to sharing please also follow this issue owncloud/core#19331 to make sure that all this fit together at the end.

@DeepDiver1975
Copy link

We need a minimal inversive operation on our storage interfaces.
If we have to touch all wrappers and storage implementation this will be a nightmare.

@labkode please add your comments and concerns to this doc - thx
(In addition UPPERCASE comments feel so rude 🙊 )

@labkode
Copy link

labkode commented Nov 4, 2015

@icewind1991

From https://gist.github.com/labkode/a84a8f66920a6cb9355c:

getParentsById(int $id): MetaData[] WHY IS THIS NEEDED ? IT LOOKS LIKE A SQL METADATA STORE IMPLEMENTATION DETAIL
There are several placing in oC here we need the metadata of all parent folders

This should be achievable with the GetMetaById + loop to keep interfaces primitive.

WARNING: the resource object SHOULD NOT be the resource obtained when doing a local open. It SHOULD be an abstract class that represents the operations that can be made on a resource independently of the storage backend. This is needed to avoid passing storage implementation objects to upper layers trough the storage interface like it is now with OC.
Not sure what you mean with "This is needed to avoid passing storage implementation objects to upper layers"

I think that the resource returned should not have readdir or truncate capabilities that are specific to the local storage implementation.

@PVince81
Copy link

PVince81 commented Nov 6, 2015

@icewind1991 can we move this to a discussion ticket in the core repo ? Github doesn't send notifications for this.
Thanks.

@labkode
Copy link

labkode commented Nov 6, 2015

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment