Last active August 29, 2015 14:04
EuroPython2014 notes

Tuesday, July 22th


  • TDD is hard, especially for legacy code
  • start with integration tests
  • input -> output pairs

Amanda - services platform

  • ngnix as a load balancer
  • uWSGI + Flask + RabbitMQ
  • HTTP as a transport layer

Extending Python

  • C API (set of macros, definition of exposed functions), uses dlopen system call
#import "Python.h"

// getting function arguments
_a = PyTuple_GET_ITEM(args, 0)

PyObject *_a = 0
ext_modules=[ ... ]
  • CTypes, advanced FFI (Foreign Functions Interface) for Python
import ctypes
lib = ctypes.CDLL("")

""" Use the lib """
n = 2
  • CFFI, allows to call functions for shared libs
from cffi import FFI
ffi = FFI()

Message passing

import python-csp


Marconi - Queuing and Notification Service

  • NOT a task manager (it can work on top of Celery)
  • NOT a Queue provisioning system
  • NOT a replacement for existing technologu
  • it is RESTFul data API
  • open-source alternative to SQS (producer-consumer) and SNS (publisher-subscriber) for apps running on OpenStack clouds
  • layers: transport / API / storage + auth middleware
  • FIFO guaranted
  • storage pools
  • easy to scale
  • TODO: Redis and AMQP support

logstash & elasticsearch

  • slides:
  • analysis: kibana / graylog2 / python/pyes
  • GELF transport layer: JSON over UDP
  • RequestID (with UUID) / CorrelationID (X-Correlation-ID HTTP header)
  • finger crossed handler approach - log everything to memory, when action level is triggered (eg. error) log all stored messages
import logbook
  • WSGI framework/app (Flask, Django, ...) - PEP 333 & 3333
  • WSGI server
  • Web servers (ngnix, Apache) - passes requests to WSGI server
  • servers & OS
  • dependencies: requirements.txt + virtualenv / PyPI
  • task queues: Celery, Taskmanager, ...
  • web analytics: GA
  • logging and monitoring

Wednesday, July 23rd

How we switched our 800+ projects from Apache to ngnix

  • different versions of PHP and Python
  • uWSGI: --py-auto-reload

Design considerations while Evaluating, Developing, Deploying a distributed task processing system

  • Celery: async task queue/job queue, uses distributed message passing
  • pip install celery
  • chord, chunks
  • unit tests the config files (linting, etc.)
  • system tests for pieces of infrastructure + mocking
  • RPM packages built for all config changes
  • "Why you don''t use Puppet?" / "Puppet solves problems we don''t have"
  • "We use Keep It Simple framework"

An HTTP request''s journey

  • ngnix - OpenResty
  • Lua script generating hashes for various domains -> backend machine

Jedi - autocompletion tool, static analysis

Thursday, JUly 24th

SOA at Disqus

  • well defined API
  • REST + Django
  • RPC + Django (for heavy logic API)
  • async: ** Django Managment Commands while True: do_work() ** Celery: post_save hook + celery task ** Celery Beat for periodic tasks
  • code sharing: internal pip libraries
  • integration tests!
  • easier to understand new systems
  • easier to not break existing systems
  • "Do one thing and do it well"
  • mulitple entry points in Django
  • logging & monitoring: StatsD

Elasticsearch from the bottom up

  • / Play
  • inverted index
  • terms generation
  • stored fields (full soruces stored)
  • segments are immutable
  • bottom/up: segments -> index -> shards -> nodes -> clusters
  • partitioning by timestamp
  • filters are cached
  • queries are not cached
  • text analysis generates terms
  • es shard == Lucene index

Lessons learned from building Elasticserch client

Inner guts of Bitbucket


