Skip to content

Instantly share code, notes, and snippets.

@psenger
Last active August 19, 2023 04:34
Show Gist options
  • Save psenger/44b8015b422052cc847b00571d877222 to your computer and use it in GitHub Desktop.
Save psenger/44b8015b422052cc847b00571d877222 to your computer and use it in GitHub Desktop.
[Regular Expression that will delete a HTML tags, any attributes, and closing tag, leaving everything in the middle intact] #RegEx

Removing "paired" or "container" tags

You can use the following regular expression to remove the <font> tags and their attributes while leaving the multi-line contentinside intact:

<font\b[^>]*>([\s\S]*?)<\/font\s*>

Explanation:

  • <font\b[^>]*>: This part matches the opening <font> tag along with any attributes.
  • ([\s\S]*?): This is a capturing group that matches any characters (including whitespace and newlines) non-greedily. It captures content that spans multiple lines.
  • <\/font\s*>: This part matches the closing </font> tag, allowing for optional spaces.

This updated regular expression should work better for capturing multi-line content within the <font> tags and removing the tags themselves. However, remember that while this can work in simple cases, for more complex HTML, using a dedicated HTML parsing library is recommended.

Removing "self-closing," "void," or "empty" tags

To remove both <br> and <br/> tags from your text using a regular expression, you can use the following pattern:

<\s*br\s*/?\s*>

Explanation:

  • <: Matches the opening angle bracket.
  • \s*: Matches any number of whitespace characters (spaces, tabs, line breaks, etc.).
  • br: Matches the literal "br" characters.
  • \s*: Matches any number of whitespace characters again.
  • /: Matches the forward slash character.
  • ?\s*: Matches zero or one occurrences of the previous character (the forward slash in this case) followed by optional whitespace characters.
  • >: Matches the closing angle bracket.

This regular expression should match both <br> and <br/> tags, allowing for optional spaces and slashes. However, keep in mind that using regular expressions for HTML manipulation is generally discouraged due to the complexities of HTML syntax and the potential for unexpected variations. Using an HTML parsing library is a more robust approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment