This introduction is here in case you'd like to treat this as a programming excercise yourself.
A trie (spoilers) is a fairly simple but powerful structure in computer science.
My goal here was simply to mimic a HashSet<string>
with much better in-memory storage results (assuming that HashSet<string>
, being generic must keep an instance of each string). NOT storing any data other than if the string exists or not.
It can allow you to store an extremely large number of strings with very small storage space and lookup operations are O(length-of-word), with no relation to how many items are being checked against.
So the problem solved here is to implement something with the interface:
public interface TrieSet
{
void Add(string value);
bool Contains(string value)
}
And be certain that you could never run out of memory as long as you limit the maximum length of strings being put into it.
So to achieve the criteria, one way is a trie. Which is a graph structure where each node is a single character, and each node points to children nodes (upto 26 child pointers for the english alphabet)
(Having a hard time figuring out exactly what perspective to take in this writing, ending up being both interview questiony and sharing knowledge of how it works...)
First further step would be making sure that substrings are not matched, so that if you insert "abcdefg", it will return false on a call of Contains("abc")
Even further would be to have it store generic data for each key, so mimicing a Dictionary<string, T>
, although there are fewer if any benefits over using a HashTable as Dictionary does