Created
June 12, 2011 03:27
-
-
Save chrisbloom7/1021218 to your computer and use it in GitHub Desktop.
Find the longest common substring in an array of strings (PHP)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
function longest_common_substring($words) | |
{ | |
$words = array_map('strtolower', array_map('trim', $words)); | |
$sort_by_strlen = create_function('$a, $b', 'if (strlen($a) == strlen($b)) { return strcmp($a, $b); } return (strlen($a) < strlen($b)) ? -1 : 1;'); | |
usort($words, $sort_by_strlen); | |
// We have to assume that each string has something in common with the first | |
// string (post sort), we just need to figure out what the longest common | |
// string is. If any string DOES NOT have something in common with the first | |
// string, return false. | |
$longest_common_substring = array(); | |
$shortest_string = str_split(array_shift($words)); | |
while (sizeof($shortest_string)) { | |
array_unshift($longest_common_substring, ''); | |
foreach ($shortest_string as $ci => $char) { | |
foreach ($words as $wi => $word) { | |
if (!strstr($word, $longest_common_substring[0] . $char)) { | |
// No match | |
break 2; | |
} // if | |
} // foreach | |
// we found the current char in each word, so add it to the first longest_common_substring element, | |
// then start checking again using the next char as well | |
$longest_common_substring[0].= $char; | |
} // foreach | |
// We've finished looping through the entire shortest_string. | |
// Remove the first char and start all over. Do this until there are no more | |
// chars to search on. | |
array_shift($shortest_string); | |
} | |
// If we made it here then we've run through everything | |
usort($longest_common_substring, $sort_by_strlen); | |
return array_pop($longest_common_substring); | |
} | |
?> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
$array = array( | |
'PTT757LP4', | |
'PTT757A', | |
'PCT757B', | |
'PCT757LP4EV' | |
); | |
echo longest_common_substring($array); | |
// => T757 | |
?> |
There is a bug
$a='Islas Vírgenes Británicas';
$b='Islas Vírgenes de EE. UU.';
$test = array($a, $b);
$foo = longest_common_substring($test);
print($foo);
// prints islas vírgenes opposed to Islas Vírgenes
Any suggestions?
@AliceWonderMiscreations Looks like an issue with UTF-8 characters. PHP has a set of multibyte functions that you could use in place of their non-multibyte versions in the function. See http://php.net/manual/en/ref.mbstring.php
Great function! Thank you!
to make it work with PHP8.x .. replace the create function line with:
16 $sort_by_strlen = function($a, $b) {
17 if (strlen($a) == strlen($b)) { return strcmp($a, $b); } return (strlen($a) < strlen($b)) ? -1 : 1;
18 };
Thanks for that addition, @deadlydud! I think this was originally created for version 5.x, and I haven't used PHP much in the last 10 years 😆
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@sasivarnakumar glad you found it useful after all this time!