Created
May 24, 2026 20:57
-
-
Save OcheOps/6b59832dae413834aea7fb07de120097 to your computer and use it in GitHub Desktop.
Technical Specification: Critical Investment Orders Microservice
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| The investment orders service should be deployed as a highly available microservice across multiple Availability Zones. I would run the application on ECS Fargate or EKS with at least two to three replicas spread across AZs, behind an Application Load Balancer. Auto-scaling would be based on CPU, memory, request latency, and queue depth. Deployments should use blue/green or canary releases with automatic rollback if error rate or latency breaches thresholds. | |
| For data storage, I would use Amazon RDS PostgreSQL or Aurora PostgreSQL in Multi-AZ mode, with automated failover enabled. Since this service handles financial orders, the database should be strongly consistent, encrypted at rest, encrypted in transit, and protected with strict IAM/security group access. Redis can be used for caching, idempotency keys, rate limiting, and short-lived locks, but not as the source of truth for orders. | |
| The service should use an event-driven pattern for reliability. Incoming orders should be persisted first, then published to a durable queue or stream such as SQS, Kafka, or EventBridge for downstream processing. Every order request should have an idempotency key to prevent duplicate orders during retries. Critical state transitions should be recorded in an audit table or immutable event log. | |
| For failover, the application should run across multiple AZs, the database should support automatic primary failover, and Redis should run in Multi-AZ mode if used. If the primary region fails, I would consider warm standby in another region depending on business requirements and regulatory constraints. | |
| Backups should include automated daily database backups, point-in-time recovery, transaction logs, and tested restore drills. For a critical financial service, backups are not enough; restore testing should be scheduled regularly. | |
| For observability, I would implement structured logs, metrics, distributed tracing, audit logs, and business metrics such as order creation rate, failed order rate, duplicate order attempts, and pending order age. Alerts should cover availability, latency, error rate, database health, queue backlog, failed orders, and unusual order volume. | |
| The SLO I would propose is 99.95% availability monthly for order submission, with P99 latency under 500ms for accepted order requests and an error rate below 0.1% excluding valid client-side errors. I’d choose 99.95% because investment order placement is business-critical and directly affects customer trust, but 99.99% may require significantly higher operational cost unless the business has strict real-time trading requirements. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment