Skip to content

Instantly share code, notes, and snippets.

@vitali2y
Created January 22, 2025 09:25
Show Gist options
  • Save vitali2y/4f64d11605c45e71bbc6d043f6da7297 to your computer and use it in GitHub Desktop.
Save vitali2y/4f64d11605c45e71bbc6d043f6da7297 to your computer and use it in GitHub Desktop.
Remove HTML Tags Sed

Below is a one liner example how to remove both <ins ... ></ins> and <script> ... </script> tags:

✗ ls *.html | xargs sed -i.bak -E '/<ins\b[^>]*>.*?<\/ins>/d; /<script\b[^>]*>.*?<\/script>/d'
✗ rg -in adsbygoogle * | more
✗ find . -type f -iname \*.html.bak -delete
✗
@vitali2y
Copy link
Author

Perl approach is well-suited for multi-line processing:

✗ find . -type f -iname \*.html | xargs perl -0777 -pi.bak \
    -e 's/<script\b[^>]*>.*?<\/script>//gs; s/<ins\b[^>]*>.*?<\/ins>//gs'
✗ find . -type f -iname \*.html.bak -delete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment