Building a Python Service Stack

So, Yelp has grown a lot
- 100 developers
- 180,000 lines of code in their main repo
- Taking longer and longer to release new features, even though the features themselves aren't getting more complex
- How to deal with this? Break up the monolithic project into lots of little codebases, speaking to each other RESTfully over HTTP
Issues of the big codebase
- Global dependencies
  - Tried to upgrade Tornado; eventually just gave up
- Now using virtualenvs
  - Investigating Docker, as it can encapsulate anything (C extensions)
- Installing dependencies using pip, Wheel packages
- Modules used to be shared using Git submodules
  - Gross
  - Now distributed via an internal PyPI; packages updated automatically when an appropriate Git tag is pushed
    - (but they'd rather be able to wait to make that tag after Jenkins finishes testing)
Framework
- Used to use Tornado; underwhelming
- Now using Pyramid and loving it
Application server
- Using uWSGI: "working well for us!"
- Wanted a stable, fast, well-documented server with good logging support (python logging -> Scribe), a good community, and the ability to have zero-downtime deploys
  - w00t! uWSGI fits the bill
Metrics
- Questions you want to be able to answer:
  - "What is the 99th percentile time for this endpoint?"
  - "Are all service instances (boxes) slow, or is it just one?"
  - "How many QPS is this endpoint handling?"
  - "Which downstream service is killing our performance?"
  - "Are any clients still using the old API?"
  - "Did this new service version introduce a performance regression?"
- There's a nice package in Java for this called (imaginatively) Metrics
- Yelp wrote a clone for Python/uWSGI, called uwsgi_metrics
  - Exposes a JSON endpoint on the app itself to pull the metrics
  - Not open-sourced yet; but will be soon
- Metrics aggregation
  - uWSGI uses a preforking model, so there are multiple processes
  - uWSGI offers a "mule" facility: processes to do miscellaneous work on behalf of Web processes
    - uwsgi_metrics uses this for aggregation
Service discovery
- Yelp used to hardcode a bunch of addresses
- Now using SmartStack (of AirBnb)
  - nerve
    - Daemon running on each service host
    - Checks health of services
    - Updates service registration accordingly in ZooKeeper
  - synapse
    - Each client host is running HAProxy
    - HTTP requests for services just target the correct endpoint on the local HAProxy
    - synapse also runs on client hosts; gets current information from ZooKeeper, reconfigures the HAProxy to point at the current up hosts
  - This is brilliant
    - But, not yet being used in production
Operations dashboard
- Exposes service uptime, active hosts, active versions, basic performance metrics
- Looked cool in a screenshot; not a lot of detail
Service documentation
- Some formal written documentation
- Also, service developers are encouraged to additionally write a client library for the service: living self-documentation
Caching
- No HTTP caching between services
  - (but nginx caching is being considered for some)
- Individual services use memcached within themselves
Security between services
- Doesn't sound like there is any currently
- Looking into SSL mutual authentication
Misc
- Apparently Twitter has some build zip replacement called "pex"?
  - yep. looks pretty cool
  - also a build tool called pants
- Q&A at meetups is still always terrible

enaeseth/Python stack talk.markdown

Building a Python Service Stack