Skip to content

Instantly share code, notes, and snippets.

View semyont's full-sized avatar

Semyon semyont

View GitHub Profile
@semyont
semyont / luigi_server.cfg
Created June 14, 2017 14:53
luigi server config explained
[core]
# These parameters control core luigi behavior, such as error e-mails and
# interactions between the worker and scheduler.
default-scheduler-host: localhost
# Hostname of the machine running the scheduler. Defaults to localhost.
default-scheduler-port: 8082
# Port of the remote scheduler api process. Defaults to 8082.
@semyont
semyont / mytasks.py
Created June 11, 2017 08:33 — forked from demoray/mytasks.py
A custom complete method for luigi tasks that will re-run the task if any prerequisite task is newer
class MyTask(luigi.Task):
def complete(self):
def _get_last(outputs):
last = 0.0
for item in outputs:
if not item.exists():
continue
current = os.path.getmtime(item.path)
if current > last:
last = current

Performance of Flask, Tornado, GEvent, and their combinations

Wensheng Wang, 10/1/11

Source: http://blog.wensheng.org/2011/10/performance-of-flask-tornado-gevent-and.html

When choosing a web framework, I pretty much have eyes set on Tornado. But I heard good things about Flask and Gevent. So I tested the performance of each and combinations of the three. I chose something just a little more advanced than a "Hello World" program to write - one that use templates. Here are the codes:

1, Pure Flask (pure_flask.py)

@semyont
semyont / gevent_wsgi_tornado.py
Created June 4, 2017 08:35
gevent wsgi tornado example
import os.path
import tornado.web
import tornado.wsgi
import gevent.wsgi
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.render('main.html', page_title="", body_id="", messages="whatever",title="home")
settings = {
@semyont
semyont / tornado_gevent_async.py
Created June 4, 2017 08:21
tornado blocking task gevent workers async example
# Do this as early as possible in your application:
from gevent import monkey; monkey.patch_all()
from tornado.web import RequestHandler, asynchronous
import gevent
class MyHandler(RequestHandler):
@asynchronous
def get(self, *args, **kwargs):
def async_task():
@semyont
semyont / elasticsearch_term_nested_aggregation.json
Last active May 16, 2017 06:43
elasticsearch collect mode for nested aggregation when top hit size is bigger then fields. then inner aggregations returns unneeded fields to the upper aggregation layer, combining filtering/ match with this will reduce variance in fields
# use un-analyzed fields
{
"aggs" : {
"domain" : {
"terms" : {
"field" : "doc.domain.keyword",
"size" : 4,
"collect_mode" : "breadth_first"
},
# GET /_search
{
"query": {
"bool": {
"must": [
{ "match": { "doc.title": "Search" }},
{ "match": { "doc.content": "Elasticsearch" }}
],
"filter": [
{ "term": { "doc.status": "published" }},
# Convert wide format csv to long format csv
# Time Temp1 Temp2 Temp3 Temp4 Temp5
# 00 21 32 33 21 23
# 10 34 23 12 08 23
# 20 12 54 33 54 55
with open("in.csv") as f,open("out.csv","w") as out:
headers = next(f).split()[1:] # keep headers/Time Temp1 Temp2 Temp3 Temp4 Temp5
for row in f:
@semyont
semyont / wordpress-mysql-docker-compose.yml
Last active April 12, 2017 21:44
Wordpress MySQL Docker Compose
version: '2'
services:
db:
image: mysql:5.7
volumes:
- db_data:/var/lib/mysql
restart: always
environment:
MYSQL_ROOT_PASSWORD: wordpress
@semyont
semyont / useful_pandas_snippets.py
Created April 12, 2017 20:55 — forked from bsweger/useful_pandas_snippets.md
Useful Pandas Snippets
# List unique values in a DataFrame column
pd.unique(df.column_name.ravel())
# Convert Series datatype to numeric, getting rid of any non-numeric values
df['col'] = df['col'].astype(str).convert_objects(convert_numeric=True)
# Grab DataFrame rows where column has certain values
valuelist = ['value1', 'value2', 'value3']
df = df[df.column.isin(valuelist)]