Created
June 26, 2025 08:17
-
-
Save allending/ed4c03978b338ffa4eaa34913e6faed9 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Logging Best Practices for Distributed Transactional Systems | |
This guide outlines 12 best practices for logging in distributed systems that handle transactional processing, with examples in Java using SLF4J. | |
## Table of Contents | |
- [1. Include a Unique Transaction / Correlation ID](#1-include-a-unique-transaction--correlation-id) | |
- [2. Log at Key Lifecycle Points](#2-log-at-key-lifecycle-points) | |
- [3. Use Structured Logging (e.g. JSON)](#3-use-structured-logging-(eg-json)) | |
- [4. Capture and Log Errors with Full Stack Traces](#4-capture-and-log-errors-with-full-stack-traces) | |
- [5. Consistent Format and Naming for Message Templates](#5-consistent-format-and-naming-for-message-templates) | |
- [6. Standard Verbiage and Voice Across Teams](#6-standard-verbiage-and-voice-across-teams) | |
- [7. Add Contextual Metadata to Every Log](#7-add-contextual-metadata-to-every-log) | |
- [8. Use Labels Derived from Business Domains or Service Names](#8-use-labels-derived-from-business-domains-or-service-names) | |
- [9. Namespace Events Using Functional or Business Group Names](#9-namespace-events-using-functional-or-business-group-names) | |
- [10. Avoid Logging Sensitive Data](#10-avoid-logging-sensitive-data) | |
- [11. Ensure Timestamps and Timezones Are Standardized](#11-ensure-timestamps-and-timezones-are-standardized) | |
- [12. Sample High-Frequency Log Messages](#12-sample-high-frequency-log-messages) | |
## 1. Include a Unique Transaction / Correlation ID | |
**Why: Correlating logs across microservices and threads requires a shared ID.** | |
```java | |
MDC.put("txnId", UUID.randomUUID().toString()); | |
log.info("START processTransaction"); | |
``` | |
## 2. Log at Key Lifecycle Points | |
**Why: Helps trace transactional states and time-to-completion.** | |
```java | |
log.info(">> order.processing.started orderId={}", orderId); | |
log.info("[OK] order.processing.completed orderId={}", orderId); | |
``` | |
## 3. Use Structured Logging (e.g. JSON) | |
**Why: Logs in key-value format enable better filtering, indexing, and visualization.** | |
```java | |
log.info("event=ORDER_PLACED orderId={} userId={}", orderId, userId); | |
``` | |
## 4. Capture and Log Errors with Full Stack Traces | |
**Why: Enables full visibility into root causes during debugging.** | |
```java | |
log.error("[ERROR] PAYMENT_FAILED orderId={} reason={}", orderId, ex.getMessage(), ex); | |
``` | |
## 5. Consistent Format and Naming for Message Templates | |
**Why: Avoids cognitive overhead and increases searchability.** | |
```java | |
log.info("SHIPMENT_STARTED orderId={} warehouse={}", orderId, location); | |
``` | |
## 6. Standard Verbiage and Voice Across Teams | |
**Why: Uniform style across teams improves clarity and reduces parsing complexity.** | |
```java | |
log.info("user.account.verified userId={}"); | |
// OR | |
log.info("ACCOUNT_VERIFIED userId={}"); | |
``` | |
## 7. Add Contextual Metadata to Every Log | |
**Why: Provides immediate insight into the who, what, and where of an event.** | |
```java | |
MDC.put("userId", user.getId()); | |
MDC.put("orderId", order.getId()); | |
log.info("ORDER_CANCELLED reason={}", cancelReason); | |
``` | |
## 8. Use Labels Derived from Business Domains or Service Names | |
**Why: Improves filtering and ownership during triage or analysis.** | |
```java | |
log.info("event=PAYMENT_RECEIVED domain=finance service=payment-gateway orderId={}", orderId); | |
``` | |
## 9. Namespace Events Using Functional or Business Group Names | |
**Why: Enables team-level filtering and dashboarding.** | |
```java | |
log.info("event=INVENTORY_RESERVED process=order-fulfillment team=supply userId={}", userId); | |
``` | |
## 10. Avoid Logging Sensitive Data | |
**Why: Prevents regulatory violations and leakage of PII or secrets.** | |
```java | |
String maskedCard = "****" + cardNumber.substring(cardNumber.length() - 4); | |
log.info("CARD_CHARGED txnId={} card={} amount={}", txnId, maskedCard, amount); | |
``` | |
## 11. Ensure Timestamps and Timezones Are Standardized | |
**Why: Consistent UTC timestamps ensure chronological accuracy across distributed systems.** | |
```java | |
log.info("heartbeat ts={}", Instant.now().toString()); | |
``` | |
## 12. Sample High-Frequency Log Messages | |
Why: Reduces log noise and storage load without sacrificing observability. | |
Example (every Nth event): | |
if (counter.incrementAndGet() % 100 == 0) { | |
log.info("sampled order state={} status={}", orderId, state); | |
} | |
Example (every X seconds): | |
if (now - lastLogTime.get() > 5000) { | |
lastLogTime.set(now); | |
log.info("sampled metrics: activeOrders={}", metrics.getActiveOrders()); | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment