Skip to content

Instantly share code, notes, and snippets.

@ncalm
Last active September 18, 2024 11:06
Show Gist options
  • Save ncalm/715a0507805ff1df95cde2a04a9709be to your computer and use it in GitHub Desktop.
Save ncalm/715a0507805ff1df95cde2a04a9709be to your computer and use it in GitHub Desktop.
This Excel LAMBDA function calculates the Levenshtein distance between two strings
/*
LEV
Calculates the Levenshtein distance between two strings
Inputs
- a: a string to compare with b
- b: a string to compare with a
- [ii]: the [ii]th position in string a
- [jj]: the [jj]th position in string b
- [arr]: an interim array produced by an iteration
Generally you do not need to use [ii], [jj] and [arr]. These are used by the
recursive feature of the algorithm.
Returns
The Levenshtein Distance between A and B, using weights of 1 for each replacement/insertion/deletion
The implementation here is guided by the Wagner-Fischer algorithm
https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorithm
*/
LEV = LAMBDA(a,b,[ii],[jj],[arr],
LET(
i,IF(ISOMITTED(ii),1,ii),
j,IF(ISOMITTED(jj),1,jj),
a_i,MID(a,i,1),
b_j,MID(b,j,1),
init_array,MAKEARRAY(
LEN(a)+1,
LEN(b)+1,
LAMBDA(r,c,IFS(r=1,c-1,c=1,r-1,TRUE,0))
),
cost,N(NOT(a_i=b_j)),
this_arr,IF(ISOMITTED(arr),init_array,arr),
option_a,INDEX(this_arr,i+1-1,j+1)+1,
option_b,INDEX(this_arr,i+1,j+1-1)+1,
option_c,INDEX(this_arr,i+1-1,j+1-1)+cost,
new_val,MIN(option_a,option_b,option_c),
overlay,MAKEARRAY(
LEN(a)+1,
LEN(b)+1,
LAMBDA(r,c,IF(AND(r=i+1,c=j+1),new_val,0))
),
new_arr,this_arr+overlay,
new_i,IF(i=LEN(a),IF(j=LEN(b),i+1,1),i+1),
new_j,IF(i<>LEN(a),j,IF(j=LEN(b),j+1,j+1)),
is_end,AND(new_i>LEN(a),new_j>LEN(b)),
IF(is_end,new_val,LEV(a,b,new_i,new_j,new_arr))
)
);
@ncalm
Copy link
Author

ncalm commented Jun 3, 2022

Have you determined if there's a limit on how long the compared strings can be? Seems like it returns a #NUM error around 27-30ish characters, but I haven't had a chance to really try debugging it yet.

Hi @bsnyder9 - apologies but I only just saw this. I haven't checked for that particular issue, but I will do some digging. Thanks for flagging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment