Skip to content

Instantly share code, notes, and snippets.

@joshz
Created November 30, 2011 16:06
Show Gist options
  • Save joshz/1409617 to your computer and use it in GitHub Desktop.
Save joshz/1409617 to your computer and use it in GitHub Desktop.
clean some content inside tags
import re
s = 'abcd<aaa>some thing <#^&*some more!#$@ </aaa> abcdefgasf <aaa>asfaf %^&*$saf asf %$^ </aaa> <another tag> some text </another tag> <aaa>sfafaff#%%%^^</aaa>'
inside_tags = re.findall('<aaa>(.+?)</aaa>', s)
cleaned_contents = [ re.sub('\W', '_' ,content) for content in inside_tags ]
zipped = zip(inside_tags, cleaned_contents)
s
for old, new in zipped:
s = s.replace(old, new)
print s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment