Skip to content

Instantly share code, notes, and snippets.

pattern

  • pattern[email protected]:clips/pattern.git -Pattern is a web mining module for Python. It has tools for:

    Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron) Network Analysis: graph centrality and visualization. It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is

Pipenv – 超好用的 Python 包管理工具
pipenv 是什么
pipenv 是 python 官方推荐的包管理工具,集成了 virtualenv、pyenv 和 pip 三者的功能于一身,类似于 php 中的 composer。
我们知道,为了方便管理 python 的虚拟环境和库,通常使用较多的是 virtualenv 、pyenv 和 pip,但是他们不够好用或者说不够偷懒。于是 requests 的作者 Kenneth Reitz 开发了用于创建和管理 python 虚拟环境的工具 —- pipenv。
它能够自动为项目创建和管理虚拟环境,从 Pipfile 文件中添加或者删除包,同时生成 Pipfile.lock 文件来锁定安装包的版本和依赖信息,避免构建错误。
pipenv 主要解决了以下问题:
from textblob import TextBlob
"""
https://elitedatascience.com/python-nlp-libraries
"""
def sentiment(tweet):
blob = TextBlob(tweet)
if blob.sentiment.polarity < 0:
return "负向"
elif blob.sentiment.polarity > 0:
import sqlite3
"""
读取文件到sqlite
"""
def insertMultipleRecords(db, sqlite_insert_query, recordList):
try:
sqliteConnection = sqlite3.connect(db)
cursor = sqliteConnection.cursor()
print("Connected to SQLite")

soup

  • Beautiful Soup

  • scraping-urls-with-beautifulsoup

  • beautiful-soup-4.readthedocs 中文

  • get_text() When to get_text() and When to Preserve Tags .get_text() strips all tags from the document you are working with and returns a string containing the text only. For example, if you are working with a large block of text that contains many hyperlinks, paragraphs, and other tags, all those will be stripped away and you’ll be left with a tagless block of text. Keep in mind that it’s much easier to find what you’re looking for in a BeautifulSoup object than in a block of text. Calling .get_text() should always be the last thing you do, immediately before you print, store, or manipulate your final data. In general, you should try to preserve the tag structure of a document as long as possible.

ES最佳实践

tuple to dict:
data = [(u'030944', u'20091123', 10, 30, 0), (u'030944', u'20100226', 10, 15, 0)]
fields = ['id', 'date', 'hour', 'minute', 'interval']
dicts = [dict(zip(fields, d)) for d in data]
嵌套字典:
class Vividict(dict):
def __missing__(self, key):
value = self[key] = type(self)()
# 10_basic.py
# 15_make_soup.py
# 20_search.py
# 25_navigation.py
# 30_edit.py
# 40_encoding.py
# 50_parse_only_part.py