Skip to content

Instantly share code, notes, and snippets.

@lawrencechen0921
Created August 30, 2019 11:46
Show Gist options
  • Save lawrencechen0921/505fe96c176c0be47dab5150a5c670fc to your computer and use it in GitHub Desktop.
Save lawrencechen0921/505fe96c176c0be47dab5150a5c670fc to your computer and use it in GitHub Desktop.
用BeautifulSoup 分析HTML 標籤、屬性和值
from bs4 import BeautifulSoup as soup
fin=open('web.htm', encoding='utf-8') #將html檔案放置 web.htm(再轉換到fin開啟)
s=fin.read()
htm=soup(s,'html.parser')
print(htm.title.prettify())
print(htm.title.contents)
print(htm.contents[0])
print(htm.title.name)
print(htm.title.string)
print(htm.meta)
print(htm.meta['content'])
for item in htm.find_all('td'):
print('item')
for item in htm.find_all('td',class_='table_head'):
print('item')
for item in htm.find_all('td',class_=table_siteurl):
print(item.a['href'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment