Skip to content

Instantly share code, notes, and snippets.

@ywchiu
Created January 7, 2015 05:35
Show Gist options
  • Save ywchiu/80b92c1039445611de82 to your computer and use it in GitHub Desktop.
Save ywchiu/80b92c1039445611de82 to your computer and use it in GitHub Desktop.
ptt_gossiping
# ptt extraction
import requests
from bs4 import BeautifulSoup
rs = requests.session()
payload = {
'from':'/bbs/Gossiping/index.html',
'yes':'yes'
}
res1 = rs.post("https://www.ptt.cc/ask/over18", data=payload, verify=False)
for i in range(7649, 7600, -1):
res = rs.get("https://www.ptt.cc/bbs/Gossiping/index%d.html"%i, verify=False)
soup = BeautifulSoup(res.text)
tr = soup.select(".r-ent")
for rec in tr:
# print rec.select('.title')[0].text.encode('utf-8').strip()
if "江" in rec.select('.title')[0].text.encode('utf-8').strip():
print rec.select('.title')[0].text.strip()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment