Created
April 4, 2014 17:42
-
-
Save kevsmith/9979455 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Near Term | |
1. Worker logs | |
2. Robustify db0.planet-labs.com | |
- See below | |
3. Migrate pipeline staging to EC2 | |
- Reduce our dependency on athq and oort | |
4. Hard-coded basic auth creds for services | |
5. Evolve v2 storage API | |
6. Evolve jobs | |
- Replace REST-based dispatch with message queue | |
- Push-style updates | |
- Standardize instance types | |
- Automate burst capacity | |
- Sample and log worker performance metrics | |
Every 15 seconds: | |
- Job type | |
- CPU usage | |
- Memory usage | |
- I/O | |
For each job: | |
- Submitter | |
- Job library version(s) | |
- Status: crashed or finished | |
- Wall clock run time | |
7. 1.0 of storage/jobs Python clients (replaces copy pasta scripts) | |
8. Package up v0 scene API mock server in a VM image | |
- Who's our partner in BD? | |
- What's the path for handling feedback? | |
9. Service monitoring / alerting | |
- Run book | |
- PagerDuty? | |
- Do we page Frank, Matt, Seth, etc for long/slow/stuck jobs? | |
* Medium Term (90+ days) | |
1. Production tile server | |
2. Customer login & accounts system | |
3. Product metrics, analytics, and reporting | |
* My projects | |
1. Robustify db0.planet-labs.com | |
- Stand up db2.planet-labs.com (4 hours) | |
- Rename db0 -> db1 and configure DNS alias (2 hour) | |
- Configure, automate, and test db1 -> db2 streaming replication (2 days) | |
- Route 53 health checks & automated failover (2 days) | |
- Nightly compressed Postgres dumps from slave to S3 w/ | |
30 day retention (4 hours) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment