This is an attempt to implement a trie in Swift with a generic key. Instead of using just strings as keys, this trie can be keyed by any object that can generate its own prefixes. This is accomplished by using a protocol to define a trie key that can return a generator of Hashable
s. We need Hashable
s because the prefixes will be stored in a Dictionary
.
The TrieNode
class will be a private implementation detail. It helps to separate the implementation into TrieNode
and the wrapper for a few reasons:
- If we want to extend the trie to implement some common Swift collection protocols, many of those don't make sense for every node.
- We want references for nodes but the wrapper struct can help us simulate value semantics.
- This will become useful later when we implement
Indexable
below.
public protocol TrieKey: Equatable {
typealias PrefixType: Hashable
func generatePrefixes() -> AnyGenerator<PrefixType>
}
private final class TrieNode<Key: TrieKey, Value> {
private var nodes: [Key.PrefixType: TrieNode] = [:]
private var element: (Key, Value)?
}
The actual Trie
struct implements all the desired behavior of a trie. It contains the root node and some functions for mutating and searching the trie. The methods updateValue(forKey:)
and removeValueForKey()
have the same semantics as their eponymous Dictionary
counterparts. The comment docs are pretty much copied from Dictionary
.
Note that find()
is private. We could have made this method return a Value?
instead and make it public, and we'd be done. But later on we'd like to implement some collection type and other protocols, and having find()
return the node instead of just the value will be useful.
public struct Trie<Key: TrieKey, Value> {
private var rootNode = TrieNode<Key, Value>()
/// Update the value stored in the trie for the given key, or, if they
/// key does not exist, add a new key-value pair to the trie.
///
/// Returns the value that was replaced, or `nil` if a new key-value pair
/// was added.
public mutating func updateValue(value: Value, forKey key: Key) -> Value? {
var cur = self.rootNode
for c in key.generatePrefixes() {
if let next = cur.nodes[c] {
cur = next
} else {
let new = TrieNode<Key, Value>()
cur.nodes[c] = new
cur = new
}
}
defer { cur.element = (key, value) }
return cur.element?.1
}
/// Remove a given key and the associated value from the trie.
/// Returns the value that was removed, or `nil` if the key was not present
/// in the trie.
public mutating func removeValueForKey(key: Key) -> Value? {
guard let node = find(key), (_, value) = node.element else {
return nil
}
defer { node.element = nil }
return value
}
/// Returns the value for the given key, or `nil` if the key is not
/// present in the trie.
private func find(key: Key) -> TrieNode<Key, Value>? {
var cur = rootNode
for c in key.generatePrefixes() {
if let next = cur.nodes[c] {
cur = next
}
}
if let (k, _) = cur.element where k == key {
return cur
}
return nil
}
// This implements `DictionaryLiteralConvertible`. For some reason, putting this implementation in an extension crashes the Swift compiler.
public init(dictionaryLiteral elements: (Key, Value)...) {
for (k,v) in elements {
self.updateValue(v, forKey: k)
}
}
}
That's it! One last thing to do before we can use the trie, we need something that implements the TrieKey
protocol. Let's say we want to key off String
:
extension String: TrieKey {
public func generatePrefixes() -> AnyGenerator<Character> {
return anyGenerator(self.characters.generate())
}
}
Note the lines below that use find()
will only work when in the same source file as Trie
because of the private membership.
var trie1 = Trie<String, Int>()
assert( trie1.updateValue(1, forKey: "foo") == nil )
assert( trie1.updateValue(3, forKey: "bar") == nil )
trie1.find("foo")
trie1.find("bar")
assert( trie1.updateValue(2, forKey: "foo") == 1 )
assert( trie1.removeValueForKey("bar") == 3 )
assert( trie1.find("bar") == nil )
To really get the most out of a collection we'd like to implement some of Swift's built in collection protocols. This will give us subscripts, literals and all the good stuff that comes for free for collections. It's natural to make a trie indexable so we'll implement that protocol.
An index into a Trie
is an opaque object which hides a reference to a node as well as a queue of future nodes to visit. Think of TrieIndex
as an encapsulation of the state of a breadth-first search. Each TrieIndex
instance represents the state for a particular iteration of the search. For each node we visit we would add the child nodes to a queue, then dequeue and repeat. TrieIndex
carries the queue with it, along with a reference to the current node. Succeeding indexes are generated by creating a new index with the next state of the queue.
public struct TrieIndex<Key: TrieKey, Value> {
private let node: TrieNode<Key, Value>
private let queue: [TrieNode<Key, Value>]
}
private extension TrieIndex {
private init(node: TrieNode<Key, Value>) {
self.node = node
self.queue = node.nodes.map { $0.1 }
}
}
The end index is represented by an index that has no element and an empty queue of future nodes. We use identity to equate all other nodes.
extension TrieIndex: Equatable {}
public func == <Key, Value>(lhs: TrieIndex<Key, Value>, rhs: TrieIndex<Key, Value>) -> Bool {
if (lhs.node.element == nil && rhs.node.element == nil)
&& (lhs.queue.count == 0 && rhs.queue.count == 0) {
return true
}
return (lhs.node === rhs.node)
}
successorByNode()
is a private implementation detail of the index. When we implement successor()
, we want each index returned to be an index of a node with a key and value. But not all nodes in the prefix tree have values. successorByNode()
is similar to successor()
as defined by ForwardIndexType
, but it simply returns the index for the next node regardless of if it has a value.
private extension TrieIndex {
private func successorByNode() -> TrieIndex<Key, Value>? {
guard let next = queue.first else { return nil }
let car = queue.dropFirst()
let cdr = next.nodes.map { $0.1 }
return TrieIndex(node: next, queue: car + cdr)
}
}
We'll make TrieIndex
conform to ForwardIndexType
because that seems to be a natural fit for how we would traverse a tree (would it even make sense if it were RandomAccessIndexType
?).
successor()
recursively calls successor()
on the index returned from successorByNode()
until it finds an index with a value or there are no more nodes.
extension TrieIndex: ForwardIndexType {
public func successor() -> TrieIndex<Key, Value> {
guard let suc = self.successorByNode() else {
return TrieIndex(node: TrieNode(), queue: [])
}
if let _ = suc.node.element {
return suc
}
return suc.successor()
}
}
Note we haven't explicitly conformed to Indexable
, we've just created the type we need for that protocol. The next section we make Trie
conform to CollectionType
which extends Indexable
.
Now that we have something to represent an index we can conform to CollectionType
. We define the endIndex
and startIndex
by creating TrieIndex
instances with an empty node and root node respectively. Note the root node may not actually contain a value, in which case we get the successor()
which we have defined to return the index of the next node with a value.
subscript(position:)
contains the only occurrence of the force-unwrap operator throughout the implementation of this data structure. We're following the same pattern as Dictionary
here by not returning an optional for index subscripting and consider it a fatal error to use an invalid subscript.
extension Trie: CollectionType {
public typealias Index = TrieIndex<Key, Value>
public typealias Element = (Key, Value)
public var endIndex: Index { return TrieIndex(node: TrieNode()) }
public var startIndex: Index {
let index = TrieIndex(node: rootNode)
guard let _ = index.node.element else {
return index.successor()
}
return index
}
public subscript (position: Index) -> Element {
return position.node.element!
}
public subscript (key: Key) -> Value? {
get {
return find(key)?.element?.1
}
set(newVal) {
if let newVal = newVal {
updateValue(newVal, forKey: key)
} else {
removeValueForKey(key)
}
}
}
}
As mentioned previously, putting the implementation for DictionaryLiteralConvertible
in an extension crashes the Swift compiler so we left that in the struct definition itself.
extension Trie: DictionaryLiteralConvertible {}
Now we're ready to use the trie with richer syntax.
var trie2 = Trie<String, Int>()
assert( trie2["foo"] == nil )
trie2["foo"] = 1
assert( trie2["foo"] == 1 )
trie2["foo"] = nil
assert( trie2["foo"] == nil )
trie2["bar"] = 2
trie2["baz"] = 12
trie2["qux"] = 64
You can iterate over all the key and values in the trie.
for (key, value) in trie2 {
print("key: \(key), value: \(value)")
}
Using indexing:
for i in trie2.startIndex..<trie2.endIndex {
let (key, value) = trie2[i]
print("key: \(key), value: \(value)")
}
We can use dictionary literals to create an immutable trie.
let trie3 = ["foo": 1, "bar": 12, "baz": 64] as Trie
assert( trie3["foo"] == 1 )
assert( trie3["bar"] == 12 )
assert( trie3["baz"] == 64 )
So you can't change the values.
//trie3["foo"] = 2 // can't modify trie3, it's defined with `let`
But we didn't go through all this trouble to define a generic trie just to use it with strings! Let's use Int
as a key now. I'll cheat a little bit and consider the "prefix" of the Int
to be the least significant digit.
extension Int: TrieKey {
public func generatePrefixes() -> AnyGenerator<Int> {
var tail = self
return anyGenerator {
defer { tail /= 10 }
let dig = tail % 10
return (dig == 0) ? nil : dig
}
}
}
let trie = [1:"Foo", 42:"Bar"] as Trie
assert( trie[1] == "Foo" )
assert( trie[2] == nil )
assert( trie[42] == "Bar" )
Making the actual generic trie data structure was easy, most of the hard work was conforming to the protocols necessary for the nice syntax.
It should be noted that
removeValueForKey
is incomplete because it does not prune the tree if its children do not contain values, so this tree will always grow. Easy to fix, but this should serve as a warning to anyone stumbling across this code from a google search or something, don't use random code on the internet unless you understand exactly what it does. The main point of this gist was to play around with Swift's type system and built-in collection types using a well known data structure.