Skip to content

Instantly share code, notes, and snippets.

@vladiant
Created April 20, 2025 06:46
Show Gist options
  • Save vladiant/607072e55429a23686c0b33e8b134b26 to your computer and use it in GitHub Desktop.
Save vladiant/607072e55429a23686c0b33e8b134b26 to your computer and use it in GitHub Desktop.
Architecture Decision Records
Why Documenting Architecture Decisions Matters
1. A Single Source of Truth
2. Better Onboarding and Cross-Team Collaboration
3. Encourage Thoughtful, Data-Driven Decisions
4. Simplify Architecture Evolution
ADRs are meant to capture key decisions that have long-term implications for your system or organization. If a decision introduces a new dependency, alters fundamental data flows, or significantly affects architecture and team processes, it likely requires an ADR.
On the other hand, minor decisions - like tweaking a library version or refactoring a single function usually don’t need an official record.
Use your judgment: if it’s significant enough that others might question later or that will be difficult to undo, document it. Otherwise, don’t let the process become an administrative burden.
https://newsletter.modern-engineering-leader.com/p/elevate-your-engineering-culture

Title

Migrating from Synchronous HTTP API to Kafka

Status

Accepted

Date

2025-03-10

Context

Our microservices currently communicate via synchronous HTTP APIs, causing latency issues and occasional disruptions when one service is unavailable. Also, the cost to handle the entire traffic is very high. Most of the communication, especially reads, don't require synchronous flow. We also anticipate a need to handle significantly higher request volumes in the near future. To increase resiliency, scalability and cost-efficiency, an asynchronous communication would be preferred.

Decision

We will transition from synchronous HTTP API calls to a Kafka-based event-driven architecture for communication between our microservices.

Rationale

  • Scalability: Kafka’s event-driven model allows simple horizontal scaling of consumers, which is critical for our anticipated traffic growth and is also very cost-efficient.

  • Resilience: Asynchronous messaging decouples microservices, so one service’s downtime doesn’t cascade throughout the system, which is especially important for writes/commands. That will allow us to take advantage of Saga pattern.

  • Cost-efficiency: A simple proof of concept indicates that just 3 Kafka consumers can read the equivalent amount of data as 25 Sidekiq workers reading from HTTP API. Also, it implies that we will be able to scale down web workers of the upstream service by 40% as we won't be reading this data from the HTP API.

Implications

  • Operational Overhead: We need to maintain a Kafka cluster, which introduces new complexity for monitoring, alerting, and administration. Amazon MSK service can be a great solution here.

  • Kafka Learning Curve: Engineers will need to gain familiarity with event-driven design patterns and Kafka itself.

  • Deployment and Migration Plan: We’ll roll out event streams incrementally to avoid a “big bang” migration. Secondary microservices will be adapted first, followed by the more critical ones.

Alternatives Considered

  1. Continue with Synchronous HTTP: Would be simpler to maintain, but scalability, resiliency and cost-efficiency trade-offs are not acceptable in the long run.

  2. Use a Different Message Broker (e.g.RabbitMQ): While viable, Kafka’s persistence and proven track record with large-scale event processing made it more appealing.

References

Title

Short title describing what this ADR is about

Status

Accepted | Superseded by ADR-xx

Date

Date

Context

Describe the nature of the problem that requires a decision and all relevant context around it.

Decision

Describe briefly the decision that was made.

Rationale

Explain the reasoning behind the decision and its trade-offs and why it is consider the preferred option.

Implications

Describe the side-effects of this decision, both technical and not-technical one. Include both positive and negative implications

Alternatives Considered

Describe any alternative solutions that were considered as a potential solution and why they were not chosen.

References

Optional. Include any links to resources that influenced the decision or might be helpful in understanding the subject of the decision in depth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment