Skip to content

Instantly share code, notes, and snippets.

@jwz-ecust
Created March 14, 2017 14:20
Show Gist options
  • Save jwz-ecust/963ea661522efd97e755bb6df074f921 to your computer and use it in GitHub Desktop.
Save jwz-ecust/963ea661522efd97e755bb6df074f921 to your computer and use it in GitHub Desktop.
简单的文章爬取库
# -*- coding: utf-8 -*-
# @Date : 2017-03-14 22:03:55
# @Author : "zhangjiawei"
# @Email : "[email protected]"
# @Link : ${https://github.com/jwz-ecust}
# @Version : $Id$
from goose import Goose
from goose.text import StopWordsChinese
url = "https://zhuanlan.zhihu.com/p/25765321"
g = Goose({"stopwords_class": StopWordsChinese})
article = g.extract(url=url)
print article.cleaned_text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment