Skip to content

Instantly share code, notes, and snippets.

@amilos
Last active November 20, 2025 16:52
Show Gist options
  • Select an option

  • Save amilos/149441a7416688db4396e293556c3130 to your computer and use it in GitHub Desktop.

Select an option

Save amilos/149441a7416688db4396e293556c3130 to your computer and use it in GitHub Desktop.

Formal Saga Definition

1. Introduction to the Saga Execution Coordinator

In the context of distributed systems orchestration, a Saga Execution Coordinator (SEC) is defined as a centralized, persistent Finite State Automaton (FSA). Its purpose is to manage the integrity of a distributed transaction by enforcing a strict execution path across a pre-ordered sequence of local transactions and their corresponding inverse operations (compensations).

2. Mathematical Definition

Let a Saga $\mathcal{S}$ be defined as a tuple of ordered local transactions and compensating transactions:

$$ \mathcal{S} = \langle T, C \rangle $$

Where:

  • $T = \langle t_1, t_2, \dots, t_n \rangle$ is a sequence of local transactions where $n \in \mathbb{N}, n \geq 1$.
  • $C = \langle c_1, c_2, \dots, c_n \rangle$ is a sequence of compensating transactions.
  • $c_i$ is the semantic inverse of $t_i$ such that the execution of $(t_i \circ c_i)$ restores the system to a state semantically equivalent to the state before $t_i$ occurred.

The Saga Execution Coordinator is a state machine $\mathcal{M}$ defined as the tuple:

$$ \mathcal{M} = (Q, \Sigma, \delta, q_0, F) $$

The Components

  1. $Q$ (States) The set of finite states representing the progress of the saga. $Q = { q_0, q_{\text{success}}, q_{\text{aborted}} } \cup { \text{Exec}_i \mid 1 \le i \le n } \cup { \text{Comp}_i \mid 1 \le i < n }$

    • $\text{Exec}_i$: State where the SEC is attempting to execute transaction $t_i$.
    • $\text{Comp}_i$: State where the SEC is attempting to execute compensation $c_i$.
  2. $\Sigma$ (Input Alphabet) The set of possible responses from the participant services. $\Sigma = { \omega, \varepsilon }$

    • $\omega$: Represents a Successful execution (ACK/Commit).
    • $\varepsilon$: Represents a Failed execution (NACK/Error).
  3. $q_0$ (Initial State) The starting state is the execution of the first transaction. $q_0 = \text{Exec}_1$

  4. $F$ (Final States) The set of terminal states where the SEC halts. $F = { q_{\text{success}}, q_{\text{aborted}} }$


3. Transition Dynamics ($\delta$)

The core logic is defined by the transition function $\delta: Q \times \Sigma \rightarrow Q$.

Forward Recovery (Normal Execution)

If the current transaction succeeds, the SEC advances to the next transaction $t_{i+1}$ or terminates successfully if $t_n$ is reached.

$$ \delta(\text{Exec}_i, \omega) = \begin{cases} \text{Exec}_{i+1} & \text{if } 1 \le i < n \ q_{\text{success}} & \text{if } i = n \end{cases} $$

Backward Recovery (Compensation)

If a transaction fails, the SEC switches to "compensating mode." If $t_i$ fails, it is assumed $t_i$ had no effect; therefore, compensation begins at $c_{i-1}$.

$$ \delta(\text{Exec}_i, \varepsilon) = \begin{cases} \text{Comp}_{i-1} & \text{if } 1 < i \le n \ q_{\text{aborted}} & \text{if } i = 1 \end{cases} $$

Once in compensating mode, successful execution of a compensation $c_i$ triggers the previous compensation $c_{i-1}$.

$$ \delta(\text{Comp}_i, \omega) = \begin{cases} \text{Comp}_{i-1} & \text{if } 1 < i < n \ q_{\text{aborted}} & \text{if } i = 1 \end{cases} $$

Valid Execution Traces

The SEC enforces that valid traces belong to one of two patterns:

  1. Commit Scenario: $\tau_{\text{commit}} = \langle t_1, t_2, \dots, t_n \rangle$
  2. Rollback Scenario (failure at $k$): $\tau_{\text{rollback}} = \langle t_1, \dots, t_{k-1}, t_k(\text{fail}), c_{k-1}, \dots, c_1 \rangle$

4. The Perfect Compensation Hypothesis

The mathematical model above relies on the omission of the transition $\delta(\text{Comp}_i, \varepsilon)$. This is the Perfect Compensation Hypothesis.

Definition

The hypothesis asserts that a compensating transaction $c_i$ never logically fails. While it may suffer transient failures, it must eventually succeed given sufficient retries.

Mathematically, this transforms compensation into a deterministic termination property. Let $r$ be the number of retries:

$$ \forall i, \lim_{r \to \infty} P(\text{outcome}(c_i, r) = \omega) = 1 $$

Importance & Consequence

  • Importance: It guarantees mathematical closure. Without it, the automaton contains "trap states" where the system is neither successful nor aborted.
  • Consequence of Failure: If met with a permanent failure during compensation, the system enters a Heuristic Failure State.
    • Broken Atomicity: The system is partially committed.
    • Manual Intervention: The automaton cannot resolve the state; human intervention or external reconciliation scripts are required to restore consistency.

5. Participant Constraints & Hypotheses

For the SEC to function correctly, participating services must adhere to the following constraints.

A. Idempotence (Stability Hypothesis)

Due to the retry mechanism required by the Perfect Compensation Hypothesis, participants must guarantee idempotence. Let $f$ be the function applying a transaction:

$$ f(s) = f(f(s)) $$

Applying the same operation multiple times produces the same side effect as applying it once. Without this, retries would corrupt data (e.g., double refunds).

B. The Pivot Transaction Structure

Sagas are often structured around a "Pivot Point" rather than uniform compensability.

  1. Compensatable Transactions ($T_c$): Steps with a defined $c_i$.
  2. Pivot Transaction ($T_p$): The point of no return. If $T_p$ commits, the Saga guarantees success.
  3. Retriable Transactions ($T_r$): Steps after the pivot. They do not require compensation logic because the system is now committed to forward recovery only.

C. Local Atomicity

Each participant must guarantee that local transaction $t_i$ is ACID compliant. If $t_i$ fails partially (leaving dirty local data) without rolling back locally, the SEC cannot effectively coordinate the global state.

D. Lack of Read Isolation (ACD)

Sagas typically lack global Isolation.

  • Assumption: The business domain accepts "Dirty Reads."
  • Consequence: External systems may view the changes made by $t_1$ before $t_n$ is reached. If the Saga subsequently rolls back, the external system may have acted on data that no longer exists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment