Skip to content

Instantly share code, notes, and snippets.

@twysto
Last active February 17, 2025 09:29
Show Gist options
  • Save twysto/332240fd866f33f0d16be2dfec760fb9 to your computer and use it in GitHub Desktop.
Save twysto/332240fd866f33f0d16be2dfec760fb9 to your computer and use it in GitHub Desktop.
Extract text content from HTML
<?php
if (! function_exists('extractTextContentFromHTML')) {
/**
* Extract text content from HTML.
* This is certainly perfectible, but it works well in most situations.
*
* @see https://regex101.com/r/3nbq1R/1
*
* @param string $html
*
* @return string
*/
function extractTextContentFromHTML(string $html): string
{
$regex = '/<([a-z][a-z0-9]*)\b[^>]*>([^<]+)<\/\1>/';
preg_match(pattern: $regex, subject: $html, matches: $captured);
return $captured[2] ?? '';
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment