The goal of the exercise is to find an efficient way to take an existing word and lookup all words in a dictionary that differ from the original by only a single character. This is the kernel function for larger algorithmic problems like the Word Ladder, where you try to find the shortest route between two words, by changing one character at a time, and where each intermediate step must be in an intial dictionary of words.
The simplest algorithm possible will take the input word and then compare it against every word in the dictionary, character by character, and count the differences. If the character differences between the word are just 1, then we add it to our list, otherwise we reject it. JavaScript has the Array.filter method which does precisely what we want and we just supply a function which returns true or false depending on if our criterion are met.
It is possible to optimize differsByOne
further to get more benefits, but given the sample inputs this already demonstrates this method is the fastest approach given our current input data. It gets increasingly slower the more base words there are in the dictionary. It loses to the hash-lookup function if the setup phase is precomputed and not included. Basically, the hash-lookup scales the best of all of the approaches assuming that you'll reuse the dictionary and that the word counts will be very large.
function bruteLookup(initialWord, words) {
function differsByOne(initialWord, newWord) {
var differs = 0;
for (var i = 0; i < initialWord.length; i++) {
if (initialWord[i] != newWord[i]) {
if(differs++) return false;
}
}
return differs === 1;
}
return words.filter(function (word) { return differsByOne(initialWord, word); });
}
Per Twitter - @FremyCompany - https://twitter.com/FremyCompany/status/615165514864816128
Build regular expressions where a single letter is replaced with a period '.' and is allowed to match any character. Build an alternation group of all variations of this single character replacement. Filter the final dictionary against this list. This has the benefit of being JIT'ed in most JavaScript environments. This actually surprised me at how fast it was.
function regexAndFilter(initialWord, dictionary) {
var chars = [].slice.call(initialWord);
var patterns = [];
for (var offset = 0; offset < chars.length; offset++) {
var original = chars[offset];
chars[offset] = ".";
patterns.push(chars.join(''));
chars[offset] = original;
}
var re = new RegExp(patterns.join("|"));
return dictionary.filter(function (word) { return word.match(re) && word !== initialWord; });
}
The following methods work on hash of the words supplied rather than working on the original array. The overhead of building the hash may be significant if the number of words is very large. However, building the hash is fairly efficient and will pay off if multiple queries are involved.
When timing the following methods the time spent in ArrayToDict is the majority of the operation of the functions so when trying to figure out which algorithms are faster it can be useful to prebuild the dictionary and change the function signatures to accept it in place of the array of words.
Remember, that your given inputs and how your algorithm will be used are critical in choosing the most optimal data structures. If we are going to constantly throw away the hash or we'll only ever do one lookup, then the filtering approaches that iterate the entire dictionary of words tend to win. Even with the overhead of the regular expression alternation group and matching.
This is a basic method which uses arrays of characters and Array.join to build new words. This is efficient because it entails a constant number of swaps and lookups per character in the string. Even long strings will have lookups in the low hundreds.
One downside is that it does require an dictionary so there is setup overhead in ArrayToDict. Newer ES 6 structures, such as a Set, might be useful here as well since it should lessen the overhead of the setup phase.
Initially I used String.fromCharCode to build the "replacement" character string, but this was pretty inefficient. Simply plucking it out of an array of prebuilt single character strings proved much better.
The overhead of this method is associated with the creation of hundreds of temporary strings. All other operations are dwarfed by this process which you can examine by commenting out portions of the algorithm and noting the time differences.
function permuteAndLookup(initialWord, dictionary) {
function ArrayToDict(dict, word) {
dict[word] = 1;
return dict;
}
dictionary = dictionary.reduce(ArrayToDict, {});
var words = [];
var baseChars = [].slice.call("abcdefghijklmnopqrstuvwxyz");
var chars = [].slice.call(initialWord);
for (var offset = 0; offset < chars.length; offset++) {
var original = chars[offset];
for (var replace = 0; replace < 26; replace++) {
chars[offset] = baseChars[replace];
var newword = chars.join('');
if (dictionary.hasOwnProperty(newword) && newword !== initialWord) {
words.push(newword);
}
}
chars[offset] = original;
}
return words;
}
This follows the same basic principles of using the hash-lookup with all 1 character change permutations of the original string. The difference is in the string construction for new words. Here we tried to reduce the overhead of the Array.join when building the string by supplying pre + post fix strings and using concatentation. For a long time, Array.join has been superior to concatentation but most engines now optimize for these cases heavily. We find this to be the case here as well, but the concatentation is still slightly slower than the Array.join case depending on the engine you run the code in.
You can vary this example between slice/substr with no changes to the code. You can also try subtituting substring but you do have to change the parameters for that.
function permuteAndLookup2(initialWord, dictionary) {
function ArrayToDict(dict, word) {
dict[word] = 1;
return dict;
}
dictionary = dictionary.reduce(ArrayToDict, {});
var words = [];
var baseChars = [].slice.call("abcdefghijklmnopqrstuvwxyz");
var chars = [].slice.call(initialWord);
for (var offset = 0; offset < chars.length; offset++) {
var pre = initialWord.slice(0, offset);
var post = initialWord.slice(offset+1);
for (var replace = 0; replace < 26; replace++) {
var newword = pre + baseChars[replace] + post;
if (dictionary.hasOwnProperty(newword) && newword !== initialWord) {
words.push(newword);
}
}
}
return words;
}
Per Twitter - @FremyCompany - https://twitter.com/FremyCompany/status/615293052853350400
Another interesting variation of the problem. By shifting the source and leaving out a character each time, we can create a list of stem words, one of which must match at a precise index if we are to vary from the original by only a single character. During the original Twitter conversation several issues were worked through, including the fact that an input like "sand" would match "tsan", if we allowed the stem "san" to match at any index. Clearly we want "san" to only be evaluated against the first index.
The provided sample here is not fully optimized. It uses indexOf to find the offset and then compare it. This is somewhat inefficient, since you already know the offset from which you want to start comparing. If you could do prefixed string comparisons from a given offset in JavaScript then that would work better, but you can't, and thus we fall back, for now, to using indexOf.
Note: ES 6 does have startsWith which would do exactly what we want.
function remyShift(initialWord, dictionary) {
var stems = [];
var stemsource = initialWord + initialWord;
for (var i = 0; i < initialWord.length; i++) {
stems.push(stemsource.substr(i, initialWord.length - 1));
}
return dictionary.filter(function (word) {
searchword = word + word;
return stems.some(function (stem, index) {
return searchword.indexOf(stem) === index;
});
});
}