Skip to content

Instantly share code, notes, and snippets.

@aaronheath
Last active June 14, 2024 04:41
Show Gist options
  • Save aaronheath/542fd1e22f1ffd8be21c0bf0b5a60b7a to your computer and use it in GitHub Desktop.
Save aaronheath/542fd1e22f1ffd8be21c0bf0b5a60b7a to your computer and use it in GitHub Desktop.
WP Function Takes HTML and Word List and Injects Links
function originalReplaceWordsAndLinks($wordsAndLinks, $string) {
foreach ($wordsAndLinks as $word => $data) {
// Regular expression pattern to find the word in the string (case-insensitive)
$pattern = '/\b' . preg_quote($word, '/') . '\b/i';
// Callback function to replace the word while preserving its case
$replacement = function($matches) use ($data) {
// Preserve the original case of the matched word
$target = $data['new_window'] ? 'target="_blank"' : '';
return '<a href="' . htmlspecialchars($data['url']) . '" ' . $target . '>' . $matches[0] . '</a>';
};
// Replace the word in the string using the callback function
$string = preg_replace_callback($pattern, $replacement, $string);
}
// Return the modified string
return $string;
}
replaceWordsAndLinks($wordsAndLinks, $string);
$wordsAndLinks = [
// 'Euro' => [
// 'url' => 'https://ccc.com',
// 'new_window' => false,
// ],
'2024 UEFA European Football Championship' => [
'url' => 'https://aaa.com',
'new_window' => true,
],
'Munich' => [
'url' => 'https://ccc.com',
'new_window' => false,
],
'Euro 2024' => [
'url' => 'https://ddd.com',
'new_window' => false,
],
'European Championship' => [
'url' => 'https://bbb.com',
'new_window' => true,
],
'Football' => [
'url' => 'https://eee.com',
'new_window' => true,
],
'many' => [
'url' => 'https://hhh.com',
'new_window' => true,
],
// 'Germany' => [
// 'url' => 'https://hhh.com',
// 'new_window' => true,
// ],
'multi-national' => [
'url' => 'https://fff.com',
'new_window' => true,
],
'national' => [
'url' => 'https://ggg.com',
'new_window' => true,
],
'Will Not Be Found' => [
'url' => 'https://zzz.com',
'new_window' => true,
],
];
$string = <<<HTML
<p>The 2024 UEFA European Football Championship, commonly referred to as UEFA Euro 2024 (stylised as UEFA EURO 2024) or simply Euro 2024, will be the 17th edition of the UEFA European Championship, the quadrennial international football championship organised by UEFA for the European men's national teams of its member associations. Germany will host the tournament, which is scheduled to take place from 14 June to 14 July 2024 and the winner will later compete in the 2025 CONMEBOL–UEFA Cup of Champions against the 2024 Copa América winner. The tournament will comprise 24 teams, with Georgia the only team making its European Championship finals debut.</p>
<p>It will be the third time that European Championship matches are played on German territory and the second time in reunified Germany, as West Germany hosted the tournament's 1988 edition, and four matches of the multi-national Euro 2020 were played in Munich. It will be the first time the competition is held in what was formerly East Germany with Leipzig as a host city "many", as well as the first time that a reunified Germany serves as a solo host nation.[1][2] The tournament will return to its usual four-year cycle, after the 2020 edition was postponed to 2021 due to the COVID-19 pandemic.</p>
<p>Italy are the defending champions, having won the 2020 tournament against England on penalties in the final.[3]</p>
HTML;
replaceWordsAndLinks($wordsAndLinks, $string);
/**
* Take supplied HTML and words to link array. Find instances of words that fulfil the requirements
* documented below and wrap them in a <a> (link).
*
* Requirements:
* - Find any matched words and wrap in a link.
* - Allow & in the word such as Orphan Drug & Rare Disease
* - Allow "-" like pre-clinical
* - Exclude the word if it is already in a link (part of the href or the text between the a tag)
* - Exclude the word if part of a subword like "many" in "Germany"
* - Exclude the word if it is the second or third or fourth in the sentence. So if the sentence
* is "Our clinic is the best clinic in the world", only the first "clinic" will be linked
*
* Example:
*
* $string = '<p>This is the year that we will succeed<p>';
*
* $wordsAndLinks = [
* 'the' => [
* 'url' => 'https://aaa.com',
* 'new_window' => true,
* ],
* 'we will' => [
* 'url' => 'https://bbb.com',
* 'new_window' => false,
* ],
* ...
* ];
*
* $output = replaceWordsAndLinks($wordsAndLinks, $string)
*
* // $output = '<p>This is <a href="https://aaa.com" target="_blank">the</a> year that <a href="https://bbb.com" target="_self">we will</a> succeed<p>'
**/
function replaceWordsAndLinks($wordsAndLinks, $string) {
// Order $wordsAndLinks by key lenth. Ordered list is processed by largest to smallest
// such that we can test that smaller string are not already linked in larger strings
$explodedByPeriods = explode('.', $string);
foreach($explodedByPeriods as $i => $sentence) {
foreach(array_keys($wordsAndLinks) as $words) {
$workingSentence = $explodedByPeriods[$i];
$startAt = 0;
do {
$foundAt = wordsWithinString($workingSentence, $words, $startAt);
if(is_int($foundAt)) {
$url = $wordsAndLinks[$words]['url'];
$target = $wordsAndLinks[$words]['new_window'] ? '_blank' : '_self';
$currentSubstr = substr($workingSentence, $foundAt, strlen($words));
$explodedByPeriods[$i] = substr_replace($workingSentence, "<a href=\"{$url}\" target=\"{$currentSubstr}\">{$words}</a>", $foundAt, strlen($words));
}
if(is_array($foundAt) && in_array($foundAt[0], ['WITHIN_LINK', 'SUB_MATCH'])) {
$startAt = $startAt + 1;
}
} while(in_array($foundAt[0] ?? null, ['WITHIN_LINK', 'SUB_MATCH']));
}
}
return implode('.', $explodedByPeriods);
}
function wordsWithinString($string, $words, $startAt) {
$startPos = stripos($string, $words, $startAt);
if($startPos === false) {
return ['NOT_FOUND'];
}
$sentenceLength = strlen($string);
$characterBefore = $startPos === 0 ? ' ' : substr($string, $startPos - 1, 1);
$characterAfter = (($endPos = ($startPos + strlen($words))) >= $sentenceLength) ? ' ' : substr($string, $endPos, 1);
if(! in_array($characterBefore, [' ', '"', '(', '[']) || ! in_array($characterAfter, [' ', ',', '.', '!', '?', '"', ')', ']'])) {
// Not found as the string match is part of a word. E.g. "many" within "Germany".
return ['SUB_MATCH', $startPos];
}
$posOfLastLinkOpen = strripos($string, '<a ', $startPos - $sentenceLength);
$posOfLastLinkClosed = strripos($string, '</a>', $startPos - $sentenceLength);
if($posOfLastLinkOpen !== false) {
if($posOfLastLinkOpen > (int) $posOfLastLinkClosed) {
// There is a link open (<a ) after the last (</a>) or there was no closed link yet.
// This means the matching string is in an open link. We can not have link within a link.
return ['WITHIN_LINK', $startPos];
}
}
return $startPos;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment