Skip to content

Instantly share code, notes, and snippets.

@Eibwen
Last active August 22, 2018 21:40
Show Gist options
  • Save Eibwen/241a4ee5f659fc5b76c2be7b828edbf4 to your computer and use it in GitHub Desktop.
Save Eibwen/241a4ee5f659fc5b76c2be7b828edbf4 to your computer and use it in GitHub Desktop.
Trie in C# (digital tree?)

Trie

This introduction is here in case you'd like to treat this as a programming excercise yourself.

Problem introduction

A trie (spoilers) is a fairly simple but powerful structure in computer science.

My goal here was simply to mimic a HashSet<string> with much better in-memory storage results (assuming that HashSet<string>, being generic must keep an instance of each string). NOT storing any data other than if the string exists or not.

It can allow you to store an extremely large number of strings with very small storage space and lookup operations are O(length-of-word), with no relation to how many items are being checked against.

So the problem solved here is to implement something with the interface:

public interface TrieSet
{
    void Add(string value);
    bool Contains(string value)
}

And be certain that you could never run out of memory as long as you limit the maximum length of strings being put into it.

Problem description

So to achieve the criteria, one way is a trie. Which is a graph structure where each node is a single character, and each node points to children nodes (upto 26 child pointers for the english alphabet)

(Having a hard time figuring out exactly what perspective to take in this writing, ending up being both interview questiony and sharing knowledge of how it works...)

Further

First further step would be making sure that substrings are not matched, so that if you insert "abcdefg", it will return false on a call of Contains("abc")

Even further would be to have it store generic data for each key, so mimicing a Dictionary<string, T>, although there are fewer if any benefits over using a HashTable as Dictionary does

/*
For fun implemented a very basic Trie, with only Add and Contains methods
Should be decently optimized for speed and memory usage (although haven't run any real tests), but could likely still be improved
(just wrote all this in LinqPad, so don't have namespaces or csproj or anything currently)
*/
void Main()
{
// Test substring
var trieSubString = new TrieSet_SubStrings(new []
{
"hi",
"hello",
"whatever",
"whatnow",
"browncow"
});
Debug.Assert(trieSubString.Contains("hello"), "Expected contain 'hello'");
Debug.Assert(trieSubString.Contains("hell"), "Expected contain 'hell'");
Debug.Assert(trieSubString.Contains("what"), "Expected contain 'what'");
Debug.Assert(!trieSubString.Contains("whatf"), "Expected NOT contain 'whatf'");
// Test exact
var trieExact = new TrieSet_ExactMatch(new[]
{
"hi",
"hello",
"whatever",
"whatnow",
"browncow"
});
Debug.Assert(trieExact.Contains("hello"), "Expected contain 'hello'");
Debug.Assert(!trieExact.Contains("hell"), "Expected NOT contain 'hell'");
Debug.Assert(!trieExact.Contains("what"), "Expected NOT contain 'what'");
Debug.Assert(!trieExact.Contains("whatf"), "Expected NOT contain 'whatf'");
}
///This will only match if the exact string was added
public class TrieSet_ExactMatch
{
public TrieSet_ExactMatch(IEnumerable<string> collection)
{
foreach (var value in collection)
{
Add(value);
}
}
TrieNode Root = new TrieNode();
public void Add(string value)
{
var node = Root;
foreach (var c in value)
{
node = node.GetOrAddChild(c);
}
node.IsWordEnd = true;
}
public bool Contains(string value)
{
var node = Root;
foreach (var c in value)
{
if (!node.TryGetChild(c, out var child))
{
return false;
}
node = child;
}
return node.IsWordEnd;
}
class TrieNode
{
public TrieNode()
{
}
public bool IsWordEnd { get; set; }
private Dictionary<char, TrieNode> Children = new Dictionary<char, TrieNode>();
public bool TryGetChild(char value, out TrieNode child)
{
return Children.TryGetValue(value, out child);
}
public TrieNode GetOrAddChild(char value)
{
if (!Children.TryGetValue(value, out var child))
{
child = new TrieNode();
Children.Add(value, child);
}
return child;
}
}
}
//This could probably be generalized to be IEnumerable<IEnumerable<T>>
// And use GetHashCode as the "node values", using Dictionary for the top level
///This will match any "StartsWith substrings" of existing items
public class TrieSet_SubStrings
{
public TrieSet_SubStrings(IEnumerable<string> collection)
{
foreach (var value in collection)
{
Add(value);
}
}
TrieNode Root = new TrieNode();
public void Add(string value)
{
var node = Root;
foreach (var c in value)
{
node = node.GetOrAddChild(c);
}
}
public bool Contains(string value)
{
var node = Root;
foreach (var c in value)
{
if (!node.TryGetChild(c, out var child))
{
return false;
}
node = child;
}
return true;
}
class TrieNode
{
public TrieNode()
{
}
private Dictionary<char, TrieNode> Children = new Dictionary<char, TrieNode>();
public bool TryGetChild(char value, out TrieNode child)
{
return Children.TryGetValue(value, out child);
}
public TrieNode GetOrAddChild(char value)
{
if (!Children.TryGetValue(value, out var child))
{
child = new TrieNode();
Children.Add(value, child);
}
return child;
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment