- Proposal: SE-NNNN
- Authors: Vincent Esche
- Review Manager: TBD
- Status: Awaiting review
Replace the Hashable
protocol by two new procotols (Hasher
and HashVisitable
) to improve safety, versatility and learnability.
Swift-evolution thread: Discussion thread topic for that proposal
Implementing Hashable
is difficult and the consequences if not done well have dire performance and safety repercussions.
The documentation of Hashable
lists a sample implementation of var hashValue
:
/// A point in an x-y coordinate system.
struct GridPoint {
var x: Int
var y: Int
}
extension GridPoint: Hashable {
var hashValue: Int {
return x.hashValue ^ y.hashValue
}
static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
return lhs.x == rhs.x && lhs.y == rhs.y
}
}
Calculating the hashes of all GridPoints
(given the above implementation) on a 1000 × 1000
grid …
let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
for y in 0..<height {
hashes.insert(GridPoint(x: x, y: y).hashValue)
}
}
print("\(hashes.count) unique hashes out of a total of \(total).")
… results in just 1024
unique hash values for 1_000_000
unique values.
In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.
Out of those 1_000_000
values the median collision count was 976
with min and max being 976
and 1000
respectively.
The collision rate will have negative impact in algorithms which heavily use hashValue
like the ones in Dictionary
and Set
. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web.
If even the official Swift documentation gets the implementation of hashValue
wrong, then who is to expect the average Swift programmer to do any better?
In contrast, running the same snippet using HashVisitable
and the semi-secure Fnv1aHash
(see below) results in zero collisions!
Finally, the design of the Hashable
protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.
Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.
The proposal deprecates the Hashable
protocol and introduces the following two:
protocol Hasher {
mutating func finish() -> Int
mutating func write(bytes: UnsafeRawBufferPointer)
}
protocol HashVisitable: Equatable {
func hash<H: Hasher>(_ hasher: inout H)
}
Hasher
is the protocol which represents a hashing algorithm, and HashVisitable
replaces Hashable
. For types entirely represented by their memory layout, the following protocol would provide a default implementation:
protocol ContiguouslyHashable: HashVisitable {}
extension ContiguouslyHashable {
func hash<H: Hasher>(_ hasher: inout H) {
var mutableSelf = self
try! Swift.withUnsafeBytes(of: &mutableSelf) {
hasher.write(bytes: $0)
}
}
}
extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}
The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.
FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:
struct Fnv1aHash {
fileprivate var state: UInt
init(seed: UInt) {
self.state = seed &+ 14695981039346656037
}
}
extension Fnv1aHash: Hasher {
mutating func write(bytes: UnsafeRawBufferPointer) {
for byte in bytes {
self.state = (self.state ^ UInt(byte)) &* 1099511628211
}
}
mutating func finish() -> Int {
return unsafeBitCast(self.state, to: Int.self)
}
}
Coming back to the sample code present in the Hashable
documentation, the new implementation would look like:
extension GridPoint: HashVisitable {
func hash<H: Hasher>(_ hasher: inout H) {
self.x.hash(&hasher)
self.y.hash(&hasher)
}
}
Making use of "extending protocols to conform to protocols":
extension Hashable: HashVisitable {
func hash<H: Hasher>(_ hasher: inout H) {
self.hashValue.hash(&hasher)
}
}
n/a
This feature should be possible to add/remove without breaking ABI.
n/a