-
-
Save nicklockwood/81b9f122f3db9e7132be7bd61d0c0cea to your computer and use it in GitHub Desktop.
extension Data { | |
init?(hexString: String) { | |
let count = hexString.count / 2 | |
var data = Data(capacity: count) | |
var i = hexString.startIndex | |
for _ in 0 ..< count { | |
let j = hexString.index(after: i) | |
if var byte = UInt8(hexString[i ... j], radix: 16) { | |
data.append(&byte, count: 1) | |
} else { | |
return nil | |
} | |
i = hexString.index(after: j) | |
} | |
self = data | |
} | |
} |
Do you need indexes into the string at all? Can you just create an iterator and call next() twice?
I haven't performance tested and I don't like the new string creation in the UInt8 construction but maybe there is a better way to do that, it is nice to get rid of the indexing though
extension Data {
init?(hexString: String) {
let count = hexString.count / 2
var data = Data(capacity: count)
var itr = hexString.makeIterator()
for _ in 0 ..< count {
if var byte = UInt8("\(itr.next()!)\(itr.next()!)", radix: 16) {
data.append(&byte, count: 1)
} else {
return nil
}
}
self = data
}
}
@josephlord
I like your idea about Iterator, it looks more... erm, swift-ish.
IMHO, Iterator here gives another piece of complexity (under the hood) and, supposedly, takes more memory than having int (startIndex) stored in local var.
Another concern is that you're explicitly unwrapping the result of .next
call so there might be crash.
Of course, that's my concerns only, I don't pretend to judge, just interested in promising conversation here.
@fallback I don't think the iterator makes a copy, it is really very similar under the hood with it just managing the index.
Regarding the force unwraps I think it preserves the original behaviour (indexing also crashes when out of bounds) but in both cases the count being hexString.count / 2 means that it is safe.
The even Swiftier approach would probably map the string into pairs of letters (itself a function taking an iterator) and then zip that with the data indices and forEach on it. That does involve multiple passes though which I was avoiding (although it can probably be done lazily to get back to the same fundamental operations).
embrace your inner Substring
import Foundation
extension Data {
init?<Hex: StringProtocol>(hexString: Hex) {
var hex = hexString[...]
self.init(capacity: hex.count / 2)
while !hex.isEmpty {
guard
let hi = hex.popFirst()?.hexDigitValue,
let lo = hex.popFirst()?.hexDigitValue
else { return nil }
append(UInt8(hi << 4 | lo))
}
}
}
That works and shows me hexDigitValue which I should have used in the iterator approach. Is there any advantage to the Substring over the iterator? I imagine it just has an additional end index though I haven’t checked.
Also if we use iterator it can be made generic over collections of Characters so it should work on Strings, Substrings and arrays of characters. It could work on Sequences if we didn’t need the count upfront.
extension Data {
init?<S>(hexString: S) where S : Collection, S.Element == Character {
let count = hexString.count / 2
var data = Data(capacity: count)
var itr = hexString.makeIterator()
for _ in 0 ..< count {
guard let hi = itr.next()?.hexDigitValue,
let lo = itr.next()?.hexDigitValue else { return nil }
data.append(UInt8(hi << 4 | lo))
}
self = data
}
}
From my own timings, popping a substring is slower than using index.after(). Not sure how it compares to iterator: https://twitter.com/nicklockwood/status/1382142248247308292
Try using hexString.utf8.withContiguousStorageIfAvailable
. You'll have to write your own hexDigitValue
though.
I find it interesting for whoever checking this out to see why Nick came up with it in the first place:
See Twitter thread