-
pattern[email protected]:clips/pattern.git -Pattern is a web mining module for Python. It has tools for:
Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron) Network Analysis: graph centrality and visualization. It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pipenv – 超好用的 Python 包管理工具 | |
pipenv 是什么 | |
pipenv 是 python 官方推荐的包管理工具,集成了 virtualenv、pyenv 和 pip 三者的功能于一身,类似于 php 中的 composer。 | |
我们知道,为了方便管理 python 的虚拟环境和库,通常使用较多的是 virtualenv 、pyenv 和 pip,但是他们不够好用或者说不够偷懒。于是 requests 的作者 Kenneth Reitz 开发了用于创建和管理 python 虚拟环境的工具 —- pipenv。 | |
它能够自动为项目创建和管理虚拟环境,从 Pipfile 文件中添加或者删除包,同时生成 Pipfile.lock 文件来锁定安装包的版本和依赖信息,避免构建错误。 | |
pipenv 主要解决了以下问题: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from textblob import TextBlob | |
""" | |
https://elitedatascience.com/python-nlp-libraries | |
""" | |
def sentiment(tweet): | |
blob = TextBlob(tweet) | |
if blob.sentiment.polarity < 0: | |
return "负向" | |
elif blob.sentiment.polarity > 0: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import sqlite3 | |
""" | |
读取文件到sqlite | |
""" | |
def insertMultipleRecords(db, sqlite_insert_query, recordList): | |
try: | |
sqliteConnection = sqlite3.connect(db) | |
cursor = sqliteConnection.cursor() | |
print("Connected to SQLite") |
-
get_text() When to get_text() and When to Preserve Tags .get_text() strips all tags from the document you are working with and returns a string containing the text only. For example, if you are working with a large block of text that contains many hyperlinks, paragraphs, and other tags, all those will be stripped away and you’ll be left with a tagless block of text. Keep in mind that it’s much easier to find what you’re looking for in a BeautifulSoup object than in a block of text. Calling .get_text() should always be the last thing you do, immediately before you print, store, or manipulate your final data. In general, you should try to preserve the tag structure of a document as long as possible.
##开发工具
- 如何构建一个高可用、低延迟的 Elasticsearch 集群
- Elastic Stack 最新动态
- 按照时间创建索引,每天自动删除过期的索引
- 使用lndex template模版,合理设计Mapping
- 分片数不能太多也不能太少,分片大小控制在20GB以内
- 使用别名
- 定期做Force Merge
- 冷热数据分离,定时做迁移任务
- 性能监控--集群监控、节点监控、索引监控、应用监控、kibana监控
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tuple to dict: | |
data = [(u'030944', u'20091123', 10, 30, 0), (u'030944', u'20100226', 10, 15, 0)] | |
fields = ['id', 'date', 'hour', 'minute', 'interval'] | |
dicts = [dict(zip(fields, d)) for d in data] | |
嵌套字典: | |
class Vividict(dict): | |
def __missing__(self, key): | |
value = self[key] = type(self)() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# 10_basic.py | |
# 15_make_soup.py | |
# 20_search.py | |
# 25_navigation.py | |
# 30_edit.py | |
# 40_encoding.py | |
# 50_parse_only_part.py |