Current C++ standard dictates impementation of user-defined types hashing through specialization of std::hash
type.
This causes tight coupling between hashing method for conctete type and overall algorithm of turning type's significant bytes
to hash value. One of consequences is constant reinvention of hash values combination in user code. Other consequence
is that client code cannot use custom hashing algorithm without rewriting std::hash
implementation for all components
of user-defined type, down to primitive ones.
This document describes way to define and substitute hashing algorithm separate from hashing method for concrete type. Design is inspired by Rust's Hasher trait.
Please note that any kind of automated generation of hashing functions for types is out of scope of this proposal. It does not involve reflection or compiler-generated methods. Although, this approch should be compatible with future addition of such automation.
See comparison with each one in "Comparison with alternatives" section below
Document proposes library-only additions and changes. No changes to core language are required
Newly introduced contepts:
- HashFunc - implementation of concrete hash algorithm.
- Hasher - wraps HashFunc, provides additional ergonomic methods and operator overloads, keeps intermediate hash value
- Hashable - a type which defines set of its internal components used to compute hash value, without defining concrete hashing algorithm.
compute_hash_value
function which combines HashFunc, Hasher and Hashable and produces hashing result.
Object which implements hash value computation by "appending" values to it. New object is copy-constructed each time new hash value is computed, so copy of just constructed instance must be relatively cheap. Most basic version must allow addition of span of raw bytes.
HashFunc also doesn't restrict implementor to std::size_t
as hash value type. This allows implementation and usage of MD5, SHA512 and other hash algorithms with digests bigger than std::size_t
.
template<typename T>
concept HashFunc =
std::copy_constructible
&& requires(T& hashFunc, std::span<const std::byte> bytes)
{
// `append` method should modify current state by combining it with provided value
hashFunc.append(bytes);
// `result` produces final hash value based on hash function internal state.
// Function isn't constant because some hash function implementations
// may perform additional computations when producing final value.
{ hashFunc.result() } -> !std::same_as<void>;
};
Properties of proposed design:
- Hash value can have type other than
std::size_t
; this way, other hash algorithms like MD5 or SHA512 can be implemented and used - Since hash function object is stateful, it may use complex state
result
is usually called once per hash function instance, so it may be relatively complex if neededstd::copy_constructible
requirement allows hash function to have non-default constructors, so user is able to provide custom seeds when he constructs first instance- Hash function type can implement other overloads of
append
method, to support direct hashing of some types
Default hash function is provided as std::hash_function
type.
FIXME: Rename
HashFunc
toHashAlgo
orHashAlgorithm
?
// Possible simple std::hash_function implementation
namespace std
{
struct hash_function
{
std::size_t result()
{
return hash_result;
}
void append(std::span<std::byte> bytes)
{
for (auto b : bytes)
hash_result = std::rotl(hash_result ^ XorConstant, 1) ^ static_cast<std::size_t>(b);
}
private:
static constexpr std::size_t XorConstant = /* some impl-defined type */;
std::size_t hash_result = 0;
};
}
Implementation-defined type which provides following interface:
namespace std
{
struct hasher
{
// Delegates directly to hash function
void operator()(std::span<const std::byte> bytes)
{
// hash_func is implementation-defined member reference to hash function
hash_func.append(bytes);
}
// Picks hash_value implementation
template<Hashable T>
void operator()(const T& value)
{
if constexpr (HashableDirectly<T, std::decay_t<decltype(hash_func)>>)
{
// hash_func is implementation-defined member reference to hash function
hash_func.append(value);
}
else if (HashableUsingMemberFunc<T>)
{
value.hash_value(*this);
}
else if (HashableUsingFreeFunc<T>)
{
// support hash_value implementations for fundamental types from `std` namespace
using std::hash_value;
hash_value(value, *this);
}
else
{
static_assert(false, "Type is not hashable");
}
}
};
}
Helper concepts:
template<typename T, typename HashFuncT>
concept HashableDirectly = requires(const T& value, HashFuncT& hash_func) {
hash_func.append(value);
};
template<typename T>
concept HashableUsingMemberFunc = requires(const T& value, std::hasher& hasher) {
value.hash_value(hasher);
};
template<typename T>
concept HashableUsingFreeFunc = requires(const T& value, std::hasher& hasher) {
hash_value(value, hasher);
};
template<typename T>
concept Hashable = HashableUsingMemberFunc<T> || HashableUsingFreeFunc<T>;
NOTE: HashableDirectly
is not part of Hashable
- it checks for presence of customization point from HashFunc's perspective. Hashable
implementations must not depend on it.
- Make
operator()
return reference to hasher, to allow code likehasher(value1)(value2);
- Make
operator()
variadic, to allow code likehasher(value1, value2);
Hashable
is a type which conforms to either HashableUsingMemberFunc
or HashableUsingFreeFunc
concept:
// Picked first
namespace std
{
template<typename T>
concept Hashable = HashableUsingMemberFunc<T> || HashableUsingFreeFunc<T>;
}
Member function is preferred over free function. Use of two concepts allows specifying hashing method for external types which don't have hashing method.
All existing std::hash
implementations should be doubled with hash_value
implementations.
namespace std
{
// Possible hash_value implementation for std::pair
template<typename T, typename U>
void hash_value(const std::pair<T, U>& value, std::hasher& hasher)
{
hasher(value.first);
hasher(value.second);
}
// Possible hash_value implementation for std::int32_t
inline void hash_value(std::int32_t value, std::hasher& hasher)
{
hasher(std::span<std::byte>(
reinterpret_cast<std::byte*>(&value),
sizeof(value)
));
}
}
Function which accepts hash function, hashable value and produces final hash value. Used wherever hash value computation is required
namespace std
{
// Normal implementation, supports user-defined hash functions
// Please note that hash function is passed by value - it needs to be copied anyway
decltype(auto) compute_hash_value(HashFunc auto hash_func, Hashable const auto& value)
{
std::hasher hasher(hash_func); // Construct hasher, make it use hash function copy
hasher(value); // Pass value to hasher
return hash_func.result(); // Retrieve hash result
}
}
Usage of hashing infrastructure outside of exising containers is generally very simple:
auto md5_result = std::compute_hash_value(md5_hash{}, value);
However main aim of this proposal is the use of new hashing infrastructure
with standard containers like std::unordered_set
, std::unordered_map
.
Unfortunately these containers use std::hash<TKey>
as their default hashing algorithm.
To resolve this particular issue and allow usage of both new hash function
and existing std::hash
user-defined specializations,
containers must support both new hash functions and legacy hasher objects.
Hash function should be used through compute_hash_value
, and legacy hasher should be used directly.
Second issue with std::hash
is the concept of enabledness of its specializations.
In short, any normal explicit specialization is considered "enabled" and can be used just fine.
However, any missing implementation isn't just considered "disabled" or not usable as hash function.
It also implies following values to be false:
std::is_default_constructible<std::hash<Key>>::value
std::is_copy_constructible<std::hash<Key>>::value
std::is_move_constructible<std::hash<Key>>::value
std::is_copy_assignable<std::hash<Key>>::value
std::is_move_assignable<std::hash<Key>>::value
To support new hashing method properly, legacy containers must somehow detect "disabled" version of std::hash
and switch to default std::hash_function
. The most plausible and backwards-compatible variant is to make std::hash
default specialization use std::compute_hash_value
internally.
- Make
std::hash
umbrella implementation use new hashing logic under the hood, with default hash function; make it disabled if typeT
is not hashablenamespace std { template<typename T> struct hash { std::size_t operator()(const T& value) const { return compute_hash_value(std::hash_function{}, value); } private: /* impl-defined: make specialization enabled if T is Hashable, disabled otherwise */ }; }
- Make standard containers accept both
std::hash
specializations and types whch conform toHashFunc
. To do so, they may use following compatibility function to compute hash of key values:template<typename T, typename ValueT> concept LegacyHasher = requires (const T& legacy_hasher, const ValueT& value) { { legacy_hasher(value) } -> std::same_as<std::size_t>; }; template<typename T, typename ResultT> concept HashFuncResultIs = requires (T& hash_func) { { hash_func.result() } -> std::same_as<ResultT>; }; template<typename T, typename HasherT> std::size_t compute_hash_value_compat(const HasherT& any_hasher, const T& value) { if constexpr (LegacyHasher<HasherT, T>) { return any_hasher(value); } else if (HashFunc<HasherT>) { static_assert( HashFuncResultIs<HasherT, std::size_t>, "Hash function must return values of type std::size_t" ); return compute_hash_value(any_hasher, value); } else { static_assert( false, "Provided hasher is neither compatible with std::hash specialization nor a hash function" ); } }
TODO
- If container (like
std::unordered_map
) is specialized with new hash function, it should be ABI-compatible because such container didn't exist prior to proposed change. - If container is specialized with previously disabled specialization of
std::hash
, it should also be ABI-compatible because such containers weren't compilable prior to proposed change. - Lastly, container specialized with enabled specialization of
std::hash
should be also ABI-compatible since there are no layout changes to container types, and such container will use its legacy implementation.
We can in principle use additional fallback for hashable types, which is their std::hash
specialization. The downside is that such types will then use hardcoded std hashing algorithm. It's preferrable to provide simple hash_value
implementations for all stdlib
types which have std::hash
specialized.