Created
June 3, 2014 13:26
-
-
Save leonardreidy/40381da2588126928058 to your computer and use it in GitHub Desktop.
How to extract or remove elements from BeautifulSoup soup
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# extract (remove) some element from the soup | |
[s.extract() for s in soup(x)] | |
# examples | |
# extract style elements | |
[s.extract() for s in soup('style')] | |
# extract script elements | |
[s.extract() for s in soup('script')] | |
# retain extracted element | |
extracted_element = [s.extract() for s in soup(x)] | |
# remove entire attributes - this seems to work more or less | |
# equivalently to the previous approach, as far as I can tell | |
# - check the docs to be sure | |
for tag in soup(): | |
for attribute in [x1, x2, x3, xn]: | |
del tag[attribute] | |
# example | |
for tag in soup(): | |
for attribute in ['style','id','class']: | |
del tag[attribute] | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment