Moved to: https://docs.google.com/document/d/11qz84ajysvVo5ZAV9mXKOeh6ay4-xgkBrubggCP5220
Heavily inluenced by the corresponding parquet-mr design: https://docs.google.com/document/d/1bEu903840yb95k9q2X-BlsYKuXoygE4VnMDl9xz_zhk
Initial PR to port the parquet-mr design to c++: apache/arrow#8023
# name_of_file
- class in file
- class in file
<detail code block>
comments
- EncryptionConfiguration
- DecryptionConfiguration
- PropertiesDrivenCryptoFactory
class PropertiesDrivenCryptoFactory {
public:
explicit PropertiesDrivenCryptoFactory(
std::shared_ptr<KmsClientFactory> kms_client_factory);
std::shared_ptr<FileEncryptionProperties> GetFileEncryptionProperties(
const KmsConnectionConfig& kms_connection_config,
std::shared_ptr<EncryptionConfiguration> encryption_config);
std::shared_ptr<FileDecryptionProperties> GetFileDecryptionProperties(
const KmsConnectionConfig& kms_connection_config,
std::shared_ptr<DecryptionConfiguration> decryption_config);
void RemoveCacheEntriesForToken(const std::string& access_token);
void RemoveCacheEntriesForAllTokens();
};
PropertiesDrivenCryptoFactory contains exactly one KmsClientFactory, so it's more idiomatic to require this on construction than have a post-construction setter method and runtime asserts.
KeyMaterialStore will eventually reside in this header and be added as a field
of EncryptionConfiguration
, but it should be omitted from the initial PR.
KeyToolkit, FileKeyWrapper, FileKeyUnwrapper should be moved to
properties_driven_crypto_factory.cc and not appear in any header since they are
internal implemenation details of PropertiesDrivenCryptoFactory
. In fact I
recommend using the private implemenation pattern in
PropertiesDrivenCryptoFactory
and inlining the other classes into that private
implemenation.
To maintain consistency with the style of parquet-cpp
, the above structures
should not be explicitly synchronized with individual mutexes. In the case of a
parquet::arrow::FileReader
, the request to read a given selection of row groups
and columns is issued from a single main thread. Properties (including
FileDecryptionProperties) are immutable after construction and thus can be
safely shared between threads with no further synchronization. Since
PropertiesDrivenCryptoFactory
is a factory for FileDecryptionProperties
, it
will not be referenced from multiple threads and internal synchronization is
unnecessary.
- KmsConnectionConfig
- KmsClient
- KmsClientFactory
struct KmsConnectionConfig {
std::string kms_instance_id;
std::string kms_instance_url;
std::function<std::string()> access_token;
};
Defaults for fields of KmsConnectionConfig are not meaningful; it defers recognition and handling of a "known default value" to implemenations of KmsClientFactory without clarifying what expected behavior when given those defaults might be, so don't provide any.
access_token
is now wrapped in std::function
; invoking that function returns
the current access token. This will be sufficient to capture arbitrary refresh
semantics, including synchronization if that's necessary for an implementer.
Moreover it relieves the burden on users of KmsConnectionConfig to detect that
refresh is necessary.
custom_kms_conf
can be introduced in a follow up when demand arises, but it
need not be present in the initial PR.
class KmsClientFactory {
public:
virtual ~KmsClientFactory() = default;
virtual std::shared_ptr<KmsClient> CreateKmsClient(
KmsConnectionConfig kms_connection_config) = 0;
};
Whether or not KmsClients produced by a factory use local wrapping need not be
exposed in the KmsClientFactory interface since an implementer of the interface
will already be aware of what wrapping methods are used. In a follow up it may
become worthwhile to add virtual bool KmsClientFactory::local_wrapping() const
and similar properties so that implementers can report characteristics of their
classes in code as well as documentation, but it need not be present in the
initial PR.
class KmsClient {
public:
virtual ~KmsClient() = default;
virtual ::arrow::util::Future<std::shared_ptr<::arrow::Buffer>> WrapKey(
const ::arrow::Buffer& key,
const std::string& master_key_identifier) = 0;
virtual ::arrow::util::Future<std::shared_ptr<::arrow::Buffer>> UnwrapKey(
const ::arrow::Buffer& wrapped_key,
const std::string& master_key_identifier) = 0;
};
For blobs of arbitrary bytes arrow::Buffer
is preferred over std::string
.
Interaction with a KMS may be slow and this asynchronicity should
be reflected in the KmsClient
interface. Returning a Future
will allow
multiple requests to be issued (for example, for all keys necessary to read
selected columns of a given file) then collected.
- LocalWrapKmsClientFactory
class LocalWrapKmsClientFactory : public KmsClientFactory {
public:
LocalWrapKmsClientFactory() = default;
std::shared_ptr<KmsClient> CreateKmsClient(
KmsConnectionConfig kms_connection_config) override;
};
The primary point of extension in this API is KmsClientFactory, so that is the interface which should be implemented. LocalKeyWrap and LocalWrapKmsClient do not need to appear in a header file.