-
-
Save phpenterprise/be9560412824e5b05e1c1b3f38266b57 to your computer and use it in GitHub Desktop.
DELIMITER $$ | |
CREATE DEFINER=`root`@`localhost` FUNCTION `COMPARE_STRING`( s1 text, s2 text) RETURNS int(11) | |
DETERMINISTIC | |
BEGIN | |
DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT; | |
DECLARE s1_char CHAR; | |
DECLARE cv0, cv1 text; | |
SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0; | |
IF s1 = s2 THEN | |
RETURN 0; | |
ELSEIF s1_len = 0 THEN | |
RETURN s2_len; | |
ELSEIF s2_len = 0 THEN | |
RETURN s1_len; | |
ELSE | |
WHILE j <= s2_len DO | |
SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1; | |
END WHILE; | |
WHILE i <= s1_len DO | |
SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1; | |
WHILE j <= s2_len DO | |
SET c = c + 1; | |
IF s1_char = SUBSTRING(s2, j, 1) THEN | |
SET cost = 0; ELSE SET cost = 1; | |
END IF; | |
SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost; | |
IF c > c_temp THEN SET c = c_temp; END IF; | |
SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1; | |
IF c > c_temp THEN | |
SET c = c_temp; | |
END IF; | |
SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1; | |
END WHILE; | |
SET cv1 = cv0, i = i + 1; | |
END WHILE; | |
END IF; | |
RETURN c; | |
END$$ | |
DELIMITER ; | |
DELIMITER $$ | |
CREATE DEFINER=`root`@`localhost` FUNCTION `SIMILARITY_STRING`(a text, b text) RETURNS double | |
BEGIN | |
RETURN ABS(((COMPARE_STRING(a, b) / length(b)) * 100) - 100); | |
END$$ | |
DELIMITER ; |
It is a symmetric solution, if you want the metric to be symmetric might depend on the use case, but if you know nothing beforehand, you should start symmetric.
Major advantage, it does not crash when length(b) = 0. (Still crashes, when both lengths are zero though).
Hi,
Thank you so much for the funcion, I have a little problem i hope you can solve
If I execute this works.
SELECT p.id, SIMILARITY_STRING(p.name, "HELLO2") AS score
FROM products p
If I execute this DOES NOT work.
SELECT p.id, SIMILARITY_STRING(p.name, "HELLO2") AS score
FROM products p
HAVING score > 80
#1366 - Incorrect string value: '\x80' for column 'cv0' at row 1
It seems that the problem are special chracters:
SELECT SIMILARITY_STRING("SPECIALCHARACTER😊", "SPECIALCHARACTER😊");
#1366 - Incorrect string value: '\xF0\x9F\x98\x8A' for column 'a' at row 1
Any idea?
Best,
Hello friends,
while at it, you may also want to take a look at Levenshtein-MySQL-UDF if you require to use sim1
for ordering or filtering
How can 'JT Mini Pizza ´¼ü%[ag$%;*' and 'AKTIONSPREIS' return a similarity of 91.66 % ?
What kind of licensing applies to these gists? Can I use the MIT license or creative commons?
What exactly is UNHEX(HEX(j)) doing? I thought those were inverse operations, but they aren't. I can't just replace them with j and have it work. I was hoping to speed this up, but adding and index for my column, and fully specifying the table and fields made no difference. Is there any way to make this faster?
Thanks for sharing. If this is more efficient, surely we have a more reliable version for use in production environment.