Last active
September 18, 2024 11:06
-
-
Save ncalm/715a0507805ff1df95cde2a04a9709be to your computer and use it in GitHub Desktop.
This Excel LAMBDA function calculates the Levenshtein distance between two strings
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
LEV | |
Calculates the Levenshtein distance between two strings | |
Inputs | |
- a: a string to compare with b | |
- b: a string to compare with a | |
- [ii]: the [ii]th position in string a | |
- [jj]: the [jj]th position in string b | |
- [arr]: an interim array produced by an iteration | |
Generally you do not need to use [ii], [jj] and [arr]. These are used by the | |
recursive feature of the algorithm. | |
Returns | |
The Levenshtein Distance between A and B, using weights of 1 for each replacement/insertion/deletion | |
The implementation here is guided by the Wagner-Fischer algorithm | |
https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm | |
*/ | |
LEV = LAMBDA(a,b,[ii],[jj],[arr], | |
LET( | |
i,IF(ISOMITTED(ii),1,ii), | |
j,IF(ISOMITTED(jj),1,jj), | |
a_i,MID(a,i,1), | |
b_j,MID(b,j,1), | |
init_array,MAKEARRAY( | |
LEN(a)+1, | |
LEN(b)+1, | |
LAMBDA(r,c,IFS(r=1,c-1,c=1,r-1,TRUE,0)) | |
), | |
cost,N(NOT(a_i=b_j)), | |
this_arr,IF(ISOMITTED(arr),init_array,arr), | |
option_a,INDEX(this_arr,i+1-1,j+1)+1, | |
option_b,INDEX(this_arr,i+1,j+1-1)+1, | |
option_c,INDEX(this_arr,i+1-1,j+1-1)+cost, | |
new_val,MIN(option_a,option_b,option_c), | |
overlay,MAKEARRAY( | |
LEN(a)+1, | |
LEN(b)+1, | |
LAMBDA(r,c,IF(AND(r=i+1,c=j+1),new_val,0)) | |
), | |
new_arr,this_arr+overlay, | |
new_i,IF(i=LEN(a),IF(j=LEN(b),i+1,1),i+1), | |
new_j,IF(i<>LEN(a),j,IF(j=LEN(b),j+1,j+1)), | |
is_end,AND(new_i>LEN(a),new_j>LEN(b)), | |
IF(is_end,new_val,LEV(a,b,new_i,new_j,new_arr)) | |
) | |
); |
Have you determined if there's a limit on how long the compared strings can be? Seems like it returns a #NUM error around 27-30ish characters, but I haven't had a chance to really try debugging it yet.
Hi @bsnyder9 - apologies but I only just saw this. I haven't checked for that particular issue, but I will do some digging. Thanks for flagging it.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Have you determined if there's a limit on how long the compared strings can be? Seems like it returns a #NUM error around 27-30ish characters, but I haven't had a chance to really try debugging it yet.