This guide outlines 12 best practices for logging in distributed systems that handle transactional processing, with examples in Java using SLF4J.
- 1. Include a Unique Transaction / Correlation ID
- 2. Log at Key Lifecycle Points
- 3. Use Structured Logging (e.g. JSON)
- 4. Capture and Log Errors with Full Stack Traces
- 5. Consistent Format and Naming for Message Templates
- 6. Standard Verbiage and Voice Across Teams
- 7. Add Contextual Metadata to Every Log
- 8. Use Labels Derived from Business Domains or Service Names
- 9. Namespace Events Using Functional or Business Group Names
- 10. Avoid Logging Sensitive Data
- 11. Ensure Timestamps and Timezones Are Standardized
- 12. Sample High-Frequency Log Messages
Why: Correlating logs across microservices and threads requires a shared ID.
MDC.put("txnId", UUID.randomUUID().toString());
log.info("START processTransaction");
Why: Helps trace transactional states and time-to-completion.
log.info(">> order.processing.started orderId={}", orderId);
log.info("[OK] order.processing.completed orderId={}", orderId);
Why: Logs in key-value format enable better filtering, indexing, and visualization.
log.info("event=ORDER_PLACED orderId={} userId={}", orderId, userId);
Why: Enables full visibility into root causes during debugging.
log.error("[ERROR] PAYMENT_FAILED orderId={} reason={}", orderId, ex.getMessage(), ex);
Why: Avoids cognitive overhead and increases searchability.
log.info("SHIPMENT_STARTED orderId={} warehouse={}", orderId, location);
Why: Uniform style across teams improves clarity and reduces parsing complexity.
log.info("user.account.verified userId={}");
// OR
log.info("ACCOUNT_VERIFIED userId={}");
Why: Provides immediate insight into the who, what, and where of an event.
MDC.put("userId", user.getId());
MDC.put("orderId", order.getId());
log.info("ORDER_CANCELLED reason={}", cancelReason);
Why: Improves filtering and ownership during triage or analysis.
log.info("event=PAYMENT_RECEIVED domain=finance service=payment-gateway orderId={}", orderId);
Why: Enables team-level filtering and dashboarding.
log.info("event=INVENTORY_RESERVED process=order-fulfillment team=supply userId={}", userId);
Why: Prevents regulatory violations and leakage of PII or secrets.
String maskedCard = "****" + cardNumber.substring(cardNumber.length() - 4);
log.info("CARD_CHARGED txnId={} card={} amount={}", txnId, maskedCard, amount);
Why: Consistent UTC timestamps ensure chronological accuracy across distributed systems.
log.info("heartbeat ts={}", Instant.now().toString());
Why: Reduces log noise and storage load without sacrificing observability.
Example (every Nth event):
if (counter.incrementAndGet() % 100 == 0) {
log.info("sampled order state={} status={}", orderId, state);
}
Example (every X seconds):
if (now - lastLogTime.get() > 5000) {
lastLogTime.set(now);
log.info("sampled metrics: activeOrders={}", metrics.getActiveOrders());
}