Separate type hashing method from hash algorithm

Abstract

Current C++ standard dictates impementation of user-defined types hashing through specialization of std::hash type. This causes tight coupling between hashing method for conctete type and overall algorithm of turning type's significant bytes to hash value. One of consequences is constant reinvention of hash values combination in user code. Other consequence is that client code cannot use custom hashing algorithm without rewriting std::hash implementation for all components of user-defined type, down to primitive ones.

This document describes way to define and substitute hashing algorithm separate from hashing method for concrete type. Design is inspired by Rust's Hasher trait.

Out of scope

Please note that any kind of automated generation of hashing functions for types is out of scope of this proposal. It does not involve reflection or compiler-generated methods. Although, this approch should be compatible with future addition of such automation.

List of prior attempts

P0814R2 (currently in review)
P0029R0
N3980
N3876
N3333

See comparison with each one in "Comparison with alternatives" section below

Suggested additions and changes

Document proposes library-only additions and changes. No changes to core language are required

Newly introduced contepts:

HashFunc - implementation of concrete hash algorithm.
Hasher - wraps HashFunc, provides additional ergonomic methods and operator overloads, keeps intermediate hash value
Hashable - a type which defines set of its internal components used to compute hash value, without defining concrete hashing algorithm.
compute_hash_value function which combines HashFunc, Hasher and Hashable and produces hashing result.

HashFunc

Object which implements hash value computation by "appending" values to it. New object is copy-constructed each time new hash value is computed, so copy of just constructed instance must be relatively cheap. Most basic version must allow addition of span of raw bytes.

HashFunc also doesn't restrict implementor to std::size_t as hash value type. This allows implementation and usage of MD5, SHA512 and other hash algorithms with digests bigger than std::size_t.

template<typename T>
concept HashFunc =
    std::copy_constructible
    && requires(T& hashFunc, std::span<const std::byte> bytes)
    {
        // `append` method should modify current state by combining it with provided value
        hashFunc.append(bytes);
        // `result` produces final hash value based on hash function internal state.
        // Function isn't constant because some hash function implementations
        // may perform additional computations when producing final value.
        { hashFunc.result() } -> !std::same_as<void>;
    };

Properties of proposed design:

Hash value can have type other than std::size_t; this way, other hash algorithms like MD5 or SHA512 can be implemented and used
Since hash function object is stateful, it may use complex state
result is usually called once per hash function instance, so it may be relatively complex if needed
std::copy_constructible requirement allows hash function to have non-default constructors, so user is able to provide custom seeds when he constructs first instance
Hash function type can implement other overloads of append method, to support direct hashing of some types

Default hash function is provided as std::hash_function type.

FIXME: Rename HashFunc to HashAlgo or HashAlgorithm?

Example

// Possible simple std::hash_function implementation
namespace std
{
    struct hash_function
    {
        std::size_t result()
        {
            return hash_result;
        }
        
        void append(std::span<std::byte> bytes)
        {
            for (auto b : bytes)
                hash_result = std::rotl(hash_result ^ XorConstant, 1) ^ static_cast<std::size_t>(b);
        }
    
    private:
        static constexpr std::size_t XorConstant = /* some impl-defined type */;
        std::size_t hash_result = 0;
    };
}

Hasher type

Implementation-defined type which provides following interface:

namespace std
{
    struct hasher
    {
        // Delegates directly to hash function
        void operator()(std::span<const std::byte> bytes)
        {
            // hash_func is implementation-defined member reference to hash function
            hash_func.append(bytes);
        }
        // Picks hash_value implementation
        template<Hashable T>
        void operator()(const T& value)
        {
            if constexpr (HashableDirectly<T, std::decay_t<decltype(hash_func)>>)
            {
                // hash_func is implementation-defined member reference to hash function
                hash_func.append(value);
            }
            else if (HashableUsingMemberFunc<T>)
            {
                value.hash_value(*this);
            }
            else if (HashableUsingFreeFunc<T>)
            {
                // support hash_value implementations for fundamental types from `std` namespace
                using std::hash_value;
                hash_value(value, *this);
            }
            else
            {
                static_assert(false, "Type is not hashable");
            }
        }
    };
}

Helper concepts:

template<typename T, typename HashFuncT>
concept HashableDirectly = requires(const T& value, HashFuncT& hash_func) {
    hash_func.append(value);
};

template<typename T>
concept HashableUsingMemberFunc = requires(const T& value, std::hasher& hasher) {
    value.hash_value(hasher);
};

template<typename T>
concept HashableUsingFreeFunc = requires(const T& value, std::hasher& hasher) {
    hash_value(value, hasher);
};

template<typename T>
concept Hashable = HashableUsingMemberFunc<T> || HashableUsingFreeFunc<T>;

NOTE: HashableDirectly is not part of Hashable - it checks for presence of customization point from HashFunc's perspective. Hashable implementations must not depend on it.

Possible extensions

Make operator() return reference to hasher, to allow code like hasher(value1)(value2);
Make operator() variadic, to allow code like hasher(value1, value2);

Hashable type

Hashable is a type which conforms to either HashableUsingMemberFunc or HashableUsingFreeFunc concept:

// Picked first
namespace std
{
    template<typename T>
    concept Hashable = HashableUsingMemberFunc<T> || HashableUsingFreeFunc<T>;
}

Member function is preferred over free function. Use of two concepts allows specifying hashing method for external types which don't have hashing method.

Examples

All existing std::hash implementations should be doubled with hash_value implementations.

namespace std
{
    // Possible hash_value implementation for std::pair
    template<typename T, typename U>
    void hash_value(const std::pair<T, U>& value, std::hasher& hasher)
    {
        hasher(value.first);
        hasher(value.second);
    }
    // Possible hash_value implementation for std::int32_t
    inline void hash_value(std::int32_t value, std::hasher& hasher)
    {
        hasher(std::span<std::byte>(
            reinterpret_cast<std::byte*>(&value),
            sizeof(value)
        ));
    }
}

`compute_hash_value`

Function which accepts hash function, hashable value and produces final hash value. Used wherever hash value computation is required

namespace std
{
    // Normal implementation, supports user-defined hash functions
    // Please note that hash function is passed by value - it needs to be copied anyway
    decltype(auto) compute_hash_value(HashFunc auto hash_func, Hashable const auto& value)
    {
        std::hasher hasher(hash_func);  // Construct hasher, make it use hash function copy
        hasher(value);                  // Pass value to hasher
        return hash_func.result();      // Retrieve hash result
    }
}

Integration with existing STL containers

Usage of hashing infrastructure outside of exising containers is generally very simple:

    auto md5_result = std::compute_hash_value(md5_hash{}, value);

However main aim of this proposal is the use of new hashing infrastructure with standard containers like std::unordered_set, std::unordered_map. Unfortunately these containers use std::hash<TKey> as their default hashing algorithm. To resolve this particular issue and allow usage of both new hash function and existing std::hash user-defined specializations, containers must support both new hash functions and legacy hasher objects. Hash function should be used through compute_hash_value, and legacy hasher should be used directly.

Second issue with std::hash is the concept of enabledness of its specializations. In short, any normal explicit specialization is considered "enabled" and can be used just fine. However, any missing implementation isn't just considered "disabled" or not usable as hash function. It also implies following values to be false:

std::is_default_constructible<std::hash<Key>>::value
std::is_copy_constructible<std::hash<Key>>::value
std::is_move_constructible<std::hash<Key>>::value
std::is_copy_assignable<std::hash<Key>>::value
std::is_move_assignable<std::hash<Key>>::value

To support new hashing method properly, legacy containers must somehow detect "disabled" version of std::hash and switch to default std::hash_function. The most plausible and backwards-compatible variant is to make std::hash default specialization use std::compute_hash_value internally.

Changes needed for integration

Make std::hash umbrella implementation use new hashing logic under the hood, with default hash function; make it disabled if type T is not hashable

namespace std
{
    template<typename T>
    struct hash {
        std::size_t operator()(const T& value) const
        {
            return compute_hash_value(std::hash_function{}, value);
        }
    
    private:
        /* impl-defined: make specialization enabled if T is Hashable, disabled otherwise */
    };
}

Make standard containers accept both std::hash specializations and types whch conform to HashFunc. To do so, they may use following compatibility function to compute hash of key values:

template<typename T, typename ValueT>
concept LegacyHasher = requires (const T& legacy_hasher, const ValueT& value) {
    { legacy_hasher(value) } -> std::same_as<std::size_t>;
};

template<typename T, typename ResultT>
concept HashFuncResultIs = requires (T& hash_func) {
    { hash_func.result() } -> std::same_as<ResultT>;
};

template<typename T, typename HasherT>
std::size_t compute_hash_value_compat(const HasherT& any_hasher, const T& value)
{
    if constexpr (LegacyHasher<HasherT, T>)
    {
        return any_hasher(value);
    }
    else if (HashFunc<HasherT>)
    {
        static_assert(
            HashFuncResultIs<HasherT, std::size_t>,
            "Hash function must return values of type std::size_t"
        );
        return compute_hash_value(any_hasher, value);
    }
    else
    {
        static_assert(
            false,
            "Provided hasher is neither compatible with std::hash specialization nor a hash function"
        );
    }
}

Comparison with alternatives

TODO

Other considerations

ABI compatibility

If container (like std::unordered_map) is specialized with new hash function, it should be ABI-compatible because such container didn't exist prior to proposed change.
If container is specialized with previously disabled specialization of std::hash, it should also be ABI-compatible because such containers weren't compilable prior to proposed change.
Lastly, container specialized with enabled specialization of std::hash should be also ABI-compatible since there are no layout changes to container types, and such container will use its legacy implementation.

Tighter integration with existing `std::hash` model

We can in principle use additional fallback for hashable types, which is their std::hash specialization. The downside is that such types will then use hardcoded std hashing algorithm. It's preferrable to provide simple hash_value implementations for all stdlib types which have std::hash specialized.

target-san/hashable-hasher-separation.md