Skip to content

Instantly share code, notes, and snippets.

@jonlabelle
Last active September 7, 2024 05:03
Show Gist options
  • Save jonlabelle/6317696 to your computer and use it in GitHub Desktop.
Save jonlabelle/6317696 to your computer and use it in GitHub Desktop.
This Regular Expression removes all attributes and values from an HTML tag, preserving the tag itself and textual content (if found).

Strip HTML Attributes

<([a-z][a-z0-9]*)[^>]*?(/?)>
token explanation
< match < at beginning of tags
( start capture group $1 - tag name
[a-z] match a through z
[a-z0-9]* match a through z or 0 through 9 zero or more times
) end capture group
[^>]*? match anything other than >, zero or more times, not-greedy (wont eat the /)
(/?) capture group $2 - / if it is there
> match >

Add some quoting, and use the replacement text <$1$2> it should strip any text after the tagname until the end of tag /> or just >.

Example

Before

HTML containing style attributes.

<p style="padding:0px;">
	<strong style="padding:0;margin:0;">hello</strong>
</p>

After

HTML attributes removed.

<p>
	<strong>hello</strong>
</p>

PHP Example

$with_attr    = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';
$without_attr = preg_replace("/<([a-z][a-z0-9]*)[^>]*?(/?)>/i",'<$1$2>', $with_attr);

echo $without_attr
<p><strong>hello</strong></p>

stackoverflow post.

@ofhope
Copy link

ofhope commented Jun 1, 2021

Nice regex. For js I had to escape the forward slash to make it a literal character<([a-z][a-z0-9]*)[^>]*?(\/?)>

@arnaumanyosa
Copy link

It does not work for me with JS, it removes also the opening tags

@miguelgisbert
Copy link

Thanks! anyway to put them back? I need to remove the attributes to pass the string to a translator and I'd like to put them back afterwards.

Thanks a lot!
Miguel

@miguelgisbert
Copy link

miguelgisbert commented Jul 22, 2022

I'll answer myself. Here's how to remove html tags and put them back again: https://gist.github.com/miguelgisbert/7ef9ee15aa0cc1ba32ea5ed192e486c3

    $str1 = "<p style='color:red;'>red</p><strong style='color:green;'>green</strong>";
    $pattern = '/<[^>]+>/';

    preg_match_all($pattern, $str1, $matches);
    $replacements = $matches[0];
    $str2 = preg_replace($pattern, '<>', $str1);

    // TRanslate $str2 with DeepL or do whatever without html tags

    $str3 = preg_replace_callback('/<>/', function($matches) use (&$replacements) {
        return array_shift($replacements);
    }, $str2);

    echo "str1 ".$str1."<br>";
    echo "str2 ".$str2."<br>";
    echo "str3 ".$str3."<br>";

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment