Skip to content

Instantly share code, notes, and snippets.

@allending
Last active June 26, 2025 08:40
Show Gist options
  • Save allending/a3b15923e36fe39fe09c9e42018337e8 to your computer and use it in GitHub Desktop.
Save allending/a3b15923e36fe39fe09c9e42018337e8 to your computer and use it in GitHub Desktop.

Logging Best Practices for Distributed Transactional Systems

This guide outlines 12 best practices for logging in distributed systems that handle transactional processing, with examples in Java using SLF4J.

Table of Contents

1. Include a Unique Transaction / Correlation ID

Why: Correlating logs across microservices and threads requires a shared ID.

MDC.put("txnId", UUID.randomUUID().toString());
log.info("START processTransaction");

2. Log at Key Lifecycle Points

Why: Helps trace transactional states and time-to-completion.

log.info(">> order.processing.started orderId={}", orderId);
log.info("[OK] order.processing.completed orderId={}", orderId);

3. Use Structured Logging (e.g. JSON)

Why: Logs in key-value format enable better filtering, indexing, and visualization.

log.info("event=ORDER_PLACED orderId={} userId={}", orderId, userId);

4. Capture and Log Errors with Full Stack Traces

Why: Enables full visibility into root causes during debugging.

log.error("[ERROR] PAYMENT_FAILED orderId={} reason={}", orderId, ex.getMessage(), ex);

5. Consistent Format and Naming for Message Templates

Why: Avoids cognitive overhead and increases searchability.

log.info("SHIPMENT_STARTED orderId={} warehouse={}", orderId, location);

6. Standard Verbiage and Voice Across Teams

Why: Uniform style across teams improves clarity and reduces parsing complexity.

log.info("user.account.verified userId={}");
// OR
log.info("ACCOUNT_VERIFIED userId={}");

7. Add Contextual Metadata to Every Log

Why: Provides immediate insight into the who, what, and where of an event.

MDC.put("userId", user.getId());
MDC.put("orderId", order.getId());
log.info("ORDER_CANCELLED reason={}", cancelReason);

8. Use Labels Derived from Business Domains or Service Names

Why: Improves filtering and ownership during triage or analysis.

log.info("event=PAYMENT_RECEIVED domain=finance service=payment-gateway orderId={}", orderId);

9. Namespace Events Using Functional or Business Group Names

Why: Enables team-level filtering and dashboarding.

log.info("event=INVENTORY_RESERVED process=order-fulfillment team=supply userId={}", userId);

10. Avoid Logging Sensitive Data

Why: Prevents regulatory violations and leakage of PII or secrets.

String maskedCard = "****" + cardNumber.substring(cardNumber.length() - 4);
log.info("CARD_CHARGED txnId={} card={} amount={}", txnId, maskedCard, amount);

11. Ensure Timestamps and Timezones Are Standardized

Why: Consistent UTC timestamps ensure chronological accuracy across distributed systems.

log.info("heartbeat ts={}", Instant.now().toString());

12. Sample High-Frequency Log Messages

Why: Reduces log noise and storage load without sacrificing observability.

Example (every Nth event):

if (counter.incrementAndGet() % 100 == 0) {
    log.info("sampled order state={} status={}", orderId, state);
}

Example (every X seconds):

if (now - lastLogTime.get() > 5000) {
    lastLogTime.set(now);
    log.info("sampled metrics: activeOrders={}", metrics.getActiveOrders());
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment