Skip to content

Instantly share code, notes, and snippets.

@kevsmith
Created April 4, 2014 17:42
Show Gist options
  • Save kevsmith/9979455 to your computer and use it in GitHub Desktop.
Save kevsmith/9979455 to your computer and use it in GitHub Desktop.
* Near Term
1. Worker logs
2. Robustify db0.planet-labs.com
- See below
3. Migrate pipeline staging to EC2
- Reduce our dependency on athq and oort
4. Hard-coded basic auth creds for services
5. Evolve v2 storage API
6. Evolve jobs
- Replace REST-based dispatch with message queue
- Push-style updates
- Standardize instance types
- Automate burst capacity
- Sample and log worker performance metrics
Every 15 seconds:
- Job type
- CPU usage
- Memory usage
- I/O
For each job:
- Submitter
- Job library version(s)
- Status: crashed or finished
- Wall clock run time
7. 1.0 of storage/jobs Python clients (replaces copy pasta scripts)
8. Package up v0 scene API mock server in a VM image
- Who's our partner in BD?
- What's the path for handling feedback?
9. Service monitoring / alerting
- Run book
- PagerDuty?
- Do we page Frank, Matt, Seth, etc for long/slow/stuck jobs?
* Medium Term (90+ days)
1. Production tile server
2. Customer login & accounts system
3. Product metrics, analytics, and reporting
* My projects
1. Robustify db0.planet-labs.com
- Stand up db2.planet-labs.com (4 hours)
- Rename db0 -> db1 and configure DNS alias (2 hour)
- Configure, automate, and test db1 -> db2 streaming replication (2 days)
- Route 53 health checks & automated failover (2 days)
- Nightly compressed Postgres dumps from slave to S3 w/
30 day retention (4 hours)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment