An explanation of what the service is doing and why. Any high level business logic should be mentioned here to give the reader an understanding of why the service exists in the first place.
List important endpoints/URLs and explain what they are responsible for.
If possible name the slack channel(s) where questions about the service should be directed. If the service has a specific person who handles queries about the service, we also encourage you to list that person here.
If there is a channel where stage and production deploys are coordinated name these channels as well.
If your service has an on-call schedule, link to it here.
Explain how the service is used. If it's an API, how do you call it? Give examples to try when interacting with the service.
If there is any architecture documentation available, link to it here.
Call out any "gotchas" with using the service, such as non-standard endpoints or non-obvious requirements for usage. In addition, mention any unusually large resource requirements.
How to start the service. Provide a URL to check that the service is running (i.e. a health endpoint).
List all the dependencies and technologies the service uses. Does it sit behind a CDN? Is it Python/Go etc.? Highlight particularly critical or non-obvious dependencies.
Any non-standard architecture decisions which aren't described under usage should be called out here.
Ideally each service should have a runbook available in case of outage, link to it here.
The runbook is (according to Wikipedia): A compilation of routine procedures and operations that the system administrator or operator carries out. Meaning, if there is a problem in production: read the runbook and follow the steps.
Explain how the service is monitored. Things to cover here include:
- Logs: Link to the logs for stage and prod environment.
- Dashboards: Provide links to any any existing dashboards.
- Alarms: Provide links to any specific charts that could show possible outages and problems.
Link here to any additional documentation that exists in GitHub or Google Drive. Suggestions for things to document:
- Contributing: We encourage completing a
CONTRIBUTING.md
for your service and linking to it here. - Deploying: Document how to deploy the application.
- Post mortems: A link to the post-mortem documents for your service.