-
-
Save chrisbloom7/1021218 to your computer and use it in GitHub Desktop.
<?php | |
function longest_common_substring($words) | |
{ | |
$words = array_map('strtolower', array_map('trim', $words)); | |
$sort_by_strlen = create_function('$a, $b', 'if (strlen($a) == strlen($b)) { return strcmp($a, $b); } return (strlen($a) < strlen($b)) ? -1 : 1;'); | |
usort($words, $sort_by_strlen); | |
// We have to assume that each string has something in common with the first | |
// string (post sort), we just need to figure out what the longest common | |
// string is. If any string DOES NOT have something in common with the first | |
// string, return false. | |
$longest_common_substring = array(); | |
$shortest_string = str_split(array_shift($words)); | |
while (sizeof($shortest_string)) { | |
array_unshift($longest_common_substring, ''); | |
foreach ($shortest_string as $ci => $char) { | |
foreach ($words as $wi => $word) { | |
if (!strstr($word, $longest_common_substring[0] . $char)) { | |
// No match | |
break 2; | |
} // if | |
} // foreach | |
// we found the current char in each word, so add it to the first longest_common_substring element, | |
// then start checking again using the next char as well | |
$longest_common_substring[0].= $char; | |
} // foreach | |
// We've finished looping through the entire shortest_string. | |
// Remove the first char and start all over. Do this until there are no more | |
// chars to search on. | |
array_shift($shortest_string); | |
} | |
// If we made it here then we've run through everything | |
usort($longest_common_substring, $sort_by_strlen); | |
return array_pop($longest_common_substring); | |
} | |
?> |
<?php | |
$array = array( | |
'PTT757LP4', | |
'PTT757A', | |
'PCT757B', | |
'PCT757LP4EV' | |
); | |
echo longest_common_substring($array); | |
// => T757 | |
?> |
Dear Chris, this is a great function. Thank you for sharing!
Is there any chance you can tell me how to modify it to the the longest common subsequence problem?
I mean, while substrings are always consecutive parts of a string, subsequences need not be.
Sorry for the late responses. Not sure GitHub ever notified me I had any comments.
@pr0fess0r - the function currently isn't aware of words, only characters. I don't think @giftlab's suggestion would work since the last character returned in your example would be the "n" in "can". My head hasn't been in PHP-land in a very long time so no solution is jumping immediately to mind. I'd love to hear how you solve it if you manage(d) to. I'm happy you found use for the code in any case.
@bobbbb - glad you found it useful. I don't have an immediate solution for what you're trying to solve and it's not likely to be anything I would be able to spend time looking at right now. Please do let me know if you found/find a solution.
Thank you very much dude :) It helped me save my time writing a complex work around to my situation here.
Nice solution 👍
@sasivarnakumar glad you found it useful after all this time!
There is a bug
$a='Islas Vírgenes Británicas';
$b='Islas Vírgenes de EE. UU.';
$test = array($a, $b);
$foo = longest_common_substring($test);
print($foo);
// prints islas vírgenes opposed to Islas Vírgenes
Any suggestions?
@AliceWonderMiscreations Looks like an issue with UTF-8 characters. PHP has a set of multibyte functions that you could use in place of their non-multibyte versions in the function. See http://php.net/manual/en/ref.mbstring.php
Great function! Thank you!
to make it work with PHP8.x .. replace the create function line with:
16 $sort_by_strlen = function($a, $b) {
17 if (strlen($a) == strlen($b)) { return strcmp($a, $b); } return (strlen($a) < strlen($b)) ? -1 : 1;
18 };
Thanks for that addition, @deadlydud! I think this was originally created for version 5.x, and I haven't used PHP much in the last 10 years 😆
@pr0fess0r that's easy enough to do post-processing - just check that the last character in the returned string is not a letter, then trim it. If it finishes with a complete word, then the string returned either matches the end of an input string (test with substr), or is terminated by a non-character (space, period, comma, etc).