Last active
December 22, 2015 14:18
-
-
Save imbyron/6484591 to your computer and use it in GitHub Desktop.
糗百命令行版,最近学习正则表达式,抛弃了bs4,写了这么一个小爬虫,爬了糗百最近7天最热门的糗事儿。
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python | |
#coding:utf-8 | |
#作者:Byron | |
#博客:http://jiabin.tk | |
import urllib2 | |
import re | |
#定义程序主函数 | |
def qiubai(page): | |
url = "http://www.qiushibaike.com/week/page/%d" % page | |
re_qb = re.compile(r'detail.*?<a.*?>(.*?)<.*?title="(.*?)">\s*(.*?)\s*?</',re.DOTALL) | |
html = urllib2.urlopen(url).read() | |
my_qiubai = re_qb.findall(html) | |
n = len(my_qiubai) | |
for i in range(n): | |
for k in range(3): | |
print my_qiubai[i][k] | |
s = raw_input("回车继续") | |
if s == "q": | |
exit() | |
print "-"*40 | |
#定义程序循环体 | |
def for_qb(): | |
for page in range(int(p),280): | |
print "-"*18 + "第" + str(page) + "页" + "-"*18 | |
qiubai(page) | |
#该部分代码的目是为了设计的严谨,尽可能的使程序不发生崩溃 | |
def if_qb(): | |
global p | |
p = raw_input("输入要看的页数1~280:") | |
if p == "q": | |
exit() | |
elif not p.isdigit() or p =="0" or int(p) > 280: | |
if_qb() | |
else: | |
for_qb() | |
print "-"*40 | |
print "糗百命令行版——Byron" | |
print "一入糗百深似海,从此节操是路人" | |
print '输入"q"退出程序' | |
print "-"*40 | |
if_qb() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
fix a bug:
糗百每页的内容有时候会发生变化,程序有时候会range出界。
现在加了一个变量,计算这页有多少条内容,之后再range,就不会出错了。
感谢@HankZhou 的反馈。