Skip to content

Instantly share code, notes, and snippets.

@alexandre
Created April 28, 2014 02:06
Show Gist options
  • Save alexandre/11360190 to your computer and use it in GitHub Desktop.
Save alexandre/11360190 to your computer and use it in GitHub Desktop.
limpando html para um app....
def _limpa_html(self, html_bruto):
'''
Processa o html recuperado e devolve apenas a lista de tags
definidas abaixo.
Dica encontrada em:
http://stackoverflow.com/questions/699468/python-html\
-sanitizer-scrubber-filter
peguei uma das ideias da thread, a mais simples...
'''
TAG_VALIDAS = ['table', 'tr', 'td', 'span', 'div']
soup = BeautifulSoup(html_bruto)
for tag in soup.findAll(True):
if tag.name not in TAG_VALIDAS:
tag.hidden = True
#return str(soup.renderContents())
return soup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment