- 如何构建一个高可用、低延迟的 Elasticsearch 集群
- Elastic Stack 最新动态
- 按照时间创建索引,每天自动删除过期的索引
- 使用lndex template模版,合理设计Mapping
- 分片数不能太多也不能太少,分片大小控制在20GB以内
- 使用别名
- 定期做Force Merge
- 冷热数据分离,定时做迁移任务
- 性能监控--集群监控、节点监控、索引监控、应用监控、kibana监控
# 10_basic.py | |
# 15_make_soup.py | |
# 20_search.py | |
# 25_navigation.py | |
# 30_edit.py | |
# 40_encoding.py | |
# 50_parse_only_part.py |
tuple to dict: | |
data = [(u'030944', u'20091123', 10, 30, 0), (u'030944', u'20100226', 10, 15, 0)] | |
fields = ['id', 'date', 'hour', 'minute', 'interval'] | |
dicts = [dict(zip(fields, d)) for d in data] | |
嵌套字典: | |
class Vividict(dict): | |
def __missing__(self, key): | |
value = self[key] = type(self)() |
##开发工具
-
get_text() When to get_text() and When to Preserve Tags .get_text() strips all tags from the document you are working with and returns a string containing the text only. For example, if you are working with a large block of text that contains many hyperlinks, paragraphs, and other tags, all those will be stripped away and you’ll be left with a tagless block of text. Keep in mind that it’s much easier to find what you’re looking for in a BeautifulSoup object than in a block of text. Calling .get_text() should always be the last thing you do, immediately before you print, store, or manipulate your final data. In general, you should try to preserve the tag structure of a document as long as possible.
import sqlite3 | |
""" | |
读取文件到sqlite | |
""" | |
def insertMultipleRecords(db, sqlite_insert_query, recordList): | |
try: | |
sqliteConnection = sqlite3.connect(db) | |
cursor = sqliteConnection.cursor() | |
print("Connected to SQLite") |
from textblob import TextBlob | |
""" | |
https://elitedatascience.com/python-nlp-libraries | |
""" | |
def sentiment(tweet): | |
blob = TextBlob(tweet) | |
if blob.sentiment.polarity < 0: | |
return "负向" | |
elif blob.sentiment.polarity > 0: |
Pipenv – 超好用的 Python 包管理工具 | |
pipenv 是什么 | |
pipenv 是 python 官方推荐的包管理工具,集成了 virtualenv、pyenv 和 pip 三者的功能于一身,类似于 php 中的 composer。 | |
我们知道,为了方便管理 python 的虚拟环境和库,通常使用较多的是 virtualenv 、pyenv 和 pip,但是他们不够好用或者说不够偷懒。于是 requests 的作者 Kenneth Reitz 开发了用于创建和管理 python 虚拟环境的工具 —- pipenv。 | |
它能够自动为项目创建和管理虚拟环境,从 Pipfile 文件中添加或者删除包,同时生成 Pipfile.lock 文件来锁定安装包的版本和依赖信息,避免构建错误。 | |
pipenv 主要解决了以下问题: |
-
pattern[email protected]:clips/pattern.git -Pattern is a web mining module for Python. It has tools for:
Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron) Network Analysis: graph centrality and visualization. It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is