sxf sxfmol

pattern

pattern [email protected]:clips/pattern.git -Pattern is a web mining module for Python. It has tools for:

Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron) Network Analysis: graph centrality and visualization. It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is

Beautiful Soup
scraping-urls-with-beautifulsoup
beautiful-soup-4.readthedocs 中文
get_text() When to get_text() and When to Preserve Tags .get_text() strips all tags from the document you are working with and returns a string containing the text only. For example, if you are working with a large block of text that contains many hyperlinks, paragraphs, and other tags, all those will be stripped away and you’ll be left with a tagless block of text. Keep in mind that it’s much easier to find what you’re looking for in a BeautifulSoup object than in a block of text. Calling .get_text() should always be the last thing you do, immediately before you print, store, or manipulate your final data. In general, you should try to preserve the tag structure of a document as long as possible.

##开发工具

	Pipenv – 超好用的 Python 包管理工具

	pipenv 是什么
	pipenv 是 python 官方推荐的包管理工具，集成了 virtualenv、pyenv 和 pip 三者的功能于一身，类似于 php 中的 composer。

	我们知道，为了方便管理 python 的虚拟环境和库，通常使用较多的是 virtualenv 、pyenv 和 pip，但是他们不够好用或者说不够偷懒。于是 requests 的作者 Kenneth Reitz 开发了用于创建和管理 python 虚拟环境的工具 —- pipenv。

	它能够自动为项目创建和管理虚拟环境，从 Pipfile 文件中添加或者删除包，同时生成 Pipfile.lock 文件来锁定安装包的版本和依赖信息，避免构建错误。

	pipenv 主要解决了以下问题：

	from textblob import TextBlob

	"""
	https://elitedatascience.com/python-nlp-libraries
	"""
	def sentiment(tweet):
	blob = TextBlob(tweet)
	if blob.sentiment.polarity < 0:
	return "负向"
	elif blob.sentiment.polarity > 0:

	import sqlite3
	"""
	读取文件到sqlite
	"""
	def insertMultipleRecords(db, sqlite_insert_query, recordList):
	try:
	sqliteConnection = sqlite3.connect(db)
	cursor = sqliteConnection.cursor()
	print("Connected to SQLite")

	tuple to dict:

	data = [(u'030944', u'20091123', 10, 30, 0), (u'030944', u'20100226', 10, 15, 0)]
	fields = ['id', 'date', 'hour', 'minute', 'interval']
	dicts = [dict(zip(fields, d)) for d in data]

	嵌套字典：
	class Vividict(dict):
	def __missing__(self, key):
	value = self[key] = type(self)()