Skip to content

Instantly share code, notes, and snippets.

@allending
Created June 26, 2025 08:17
Show Gist options
  • Save allending/ed4c03978b338ffa4eaa34913e6faed9 to your computer and use it in GitHub Desktop.
Save allending/ed4c03978b338ffa4eaa34913e6faed9 to your computer and use it in GitHub Desktop.
# Logging Best Practices for Distributed Transactional Systems
This guide outlines 12 best practices for logging in distributed systems that handle transactional processing, with examples in Java using SLF4J.
## Table of Contents
- [1. Include a Unique Transaction / Correlation ID](#1-include-a-unique-transaction--correlation-id)
- [2. Log at Key Lifecycle Points](#2-log-at-key-lifecycle-points)
- [3. Use Structured Logging (e.g. JSON)](#3-use-structured-logging-(eg-json))
- [4. Capture and Log Errors with Full Stack Traces](#4-capture-and-log-errors-with-full-stack-traces)
- [5. Consistent Format and Naming for Message Templates](#5-consistent-format-and-naming-for-message-templates)
- [6. Standard Verbiage and Voice Across Teams](#6-standard-verbiage-and-voice-across-teams)
- [7. Add Contextual Metadata to Every Log](#7-add-contextual-metadata-to-every-log)
- [8. Use Labels Derived from Business Domains or Service Names](#8-use-labels-derived-from-business-domains-or-service-names)
- [9. Namespace Events Using Functional or Business Group Names](#9-namespace-events-using-functional-or-business-group-names)
- [10. Avoid Logging Sensitive Data](#10-avoid-logging-sensitive-data)
- [11. Ensure Timestamps and Timezones Are Standardized](#11-ensure-timestamps-and-timezones-are-standardized)
- [12. Sample High-Frequency Log Messages](#12-sample-high-frequency-log-messages)
## 1. Include a Unique Transaction / Correlation ID
**Why: Correlating logs across microservices and threads requires a shared ID.**
```java
MDC.put("txnId", UUID.randomUUID().toString());
log.info("START processTransaction");
```
## 2. Log at Key Lifecycle Points
**Why: Helps trace transactional states and time-to-completion.**
```java
log.info(">> order.processing.started orderId={}", orderId);
log.info("[OK] order.processing.completed orderId={}", orderId);
```
## 3. Use Structured Logging (e.g. JSON)
**Why: Logs in key-value format enable better filtering, indexing, and visualization.**
```java
log.info("event=ORDER_PLACED orderId={} userId={}", orderId, userId);
```
## 4. Capture and Log Errors with Full Stack Traces
**Why: Enables full visibility into root causes during debugging.**
```java
log.error("[ERROR] PAYMENT_FAILED orderId={} reason={}", orderId, ex.getMessage(), ex);
```
## 5. Consistent Format and Naming for Message Templates
**Why: Avoids cognitive overhead and increases searchability.**
```java
log.info("SHIPMENT_STARTED orderId={} warehouse={}", orderId, location);
```
## 6. Standard Verbiage and Voice Across Teams
**Why: Uniform style across teams improves clarity and reduces parsing complexity.**
```java
log.info("user.account.verified userId={}");
// OR
log.info("ACCOUNT_VERIFIED userId={}");
```
## 7. Add Contextual Metadata to Every Log
**Why: Provides immediate insight into the who, what, and where of an event.**
```java
MDC.put("userId", user.getId());
MDC.put("orderId", order.getId());
log.info("ORDER_CANCELLED reason={}", cancelReason);
```
## 8. Use Labels Derived from Business Domains or Service Names
**Why: Improves filtering and ownership during triage or analysis.**
```java
log.info("event=PAYMENT_RECEIVED domain=finance service=payment-gateway orderId={}", orderId);
```
## 9. Namespace Events Using Functional or Business Group Names
**Why: Enables team-level filtering and dashboarding.**
```java
log.info("event=INVENTORY_RESERVED process=order-fulfillment team=supply userId={}", userId);
```
## 10. Avoid Logging Sensitive Data
**Why: Prevents regulatory violations and leakage of PII or secrets.**
```java
String maskedCard = "****" + cardNumber.substring(cardNumber.length() - 4);
log.info("CARD_CHARGED txnId={} card={} amount={}", txnId, maskedCard, amount);
```
## 11. Ensure Timestamps and Timezones Are Standardized
**Why: Consistent UTC timestamps ensure chronological accuracy across distributed systems.**
```java
log.info("heartbeat ts={}", Instant.now().toString());
```
## 12. Sample High-Frequency Log Messages
Why: Reduces log noise and storage load without sacrificing observability.
Example (every Nth event):
if (counter.incrementAndGet() % 100 == 0) {
log.info("sampled order state={} status={}", orderId, state);
}
Example (every X seconds):
if (now - lastLogTime.get() > 5000) {
lastLogTime.set(now);
log.info("sampled metrics: activeOrders={}", metrics.getActiveOrders());
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment