Understanding how state survives failures - and how to test it.
When an entity "dies" (runner crashes, shard reassigned), there are two separate things to worry about:
| Concern | Question | Effect Cluster Solution |
|---|---|---|
| Message durability | Did the request survive? | MessageStorage persists pending messages |
| State durability | Did the entity's in-memory state survive? | You must handle this yourself |
This is different from Temporal, where workflow local variables are automatically tracked via event history replay.
┌─────────────────────────────────────────────────────────────┐
│ Message Lifecycle │
├─────────────────────────────────────────────────────────────┤
│ 1. Message arrives → Stored in MessageStorage │
│ 2. Entity processes → Handler runs, side effects happen│
│ 3. Response stored → Message marked complete │
└─────────────────────────────────────────────────────────────┘
If entity dies during step 2:
- Message is still in storage (step 1 completed)
- Response not stored yet (step 3 didn't happen)
- Message will be redelivered to new entity instance
But the entity's in-memory state is lost. The new instance starts fresh.
Effect Cluster entities are like Erlang processes:
┌──────────────────────────────────────────────────────────────┐
│ Erlang/OTP Pattern │ Effect Cluster Equivalent │
├──────────────────────────────────────────────────────────────┤
│ Process state = RAM │ Entity state = RAM │
│ Process crash = state lost │ Entity crash = state lost │
│ Supervisor restarts process │ Sharding restarts entity │
│ Use mnesia/ETS for durable │ Use DB for durable state │
│ "Let it crash" philosophy │ Typed errors + recovery │
└──────────────────────────────────────────────────────────────┘
Key insight: Processes/entities are cheap and ephemeral. Important state belongs in durable storage.
When state loss is acceptable:
const RateLimiter = Entity.make("RateLimiter", [
Rpc.make("Check", { success: Schema.Struct({ allowed: Schema.Boolean }) })
])
const RateLimiterLive = RateLimiter.toLayer(Effect.gen(function*() {
// Volatile state - reset on restart is fine
let count = 0
let windowStart = yield* Clock.currentTimeMillis
return {
Check: () => Effect.gen(function*() {
const now = yield* Clock.currentTimeMillis
if (now - windowStart > 60_000) {
count = 0
windowStart = now
}
count++
return { allowed: count <= 100 }
})
}
}))Testing volatile state:
it.effect("resets on entity restart", () => Effect.gen(function*() {
const client = yield* RateLimiter.client
const limiter = client("api-key-1")
// Use up some quota
yield* limiter.Check()
yield* limiter.Check()
// Simulate entity restart by killing the shard
yield* Sharding.terminateEntity(EntityAddress.make("RateLimiter", "api-key-1"))
// New entity instance - count should be 0 again
const result = yield* limiter.Check()
expect(result.allowed).toBe(true) // Fresh start
}))For business-critical state that must survive failures:
const Order = Entity.make("Order", [
Rpc.make("Create", {
payload: Schema.Struct({ customerId: Schema.String, items: Schema.Array(Item) }),
success: OrderState
}),
Rpc.make("ChargePayment", { payload: Schema.Number, success: PaymentResult }),
Rpc.make("GetState", { success: OrderState })
])
const OrderLive = Order.toLayer(Effect.gen(function*() {
const db = yield* Database
const entityId = yield* Entity.currentId
// Load existing state on entity creation
let order = yield* db.orders.findById(entityId).pipe(
Effect.orElseSucceed(() => null as OrderState | null)
)
return {
Create: ({ payload }) => Effect.gen(function*() {
order = {
id: entityId,
customerId: payload.customerId,
items: payload.items,
status: 'created',
createdAt: yield* Clock.currentTimeMillis
}
yield* db.orders.upsert(entityId, order) // Persist immediately
return order
}),
ChargePayment: ({ payload: amount }) => Effect.gen(function*() {
if (!order) {
return yield* Effect.fail(new OrderNotFound({ entityId }))
}
// Idempotency: already charged?
if (order.paymentId) {
return { alreadyCharged: true, paymentId: order.paymentId }
}
// Charge with idempotency key (safe to retry)
const result = yield* Stripe.charge({
customerId: order.customerId,
amount,
idempotencyKey: `order-${entityId}-payment`
})
// Update state and persist
order = { ...order, paymentId: result.id, status: 'paid' }
yield* db.orders.upsert(entityId, order)
return result
}),
GetState: () => Effect.gen(function*() {
if (!order) {
return yield* Effect.fail(new OrderNotFound({ entityId }))
}
return order
})
}
}))Testing state persistence across restarts:
describe("Order entity durability", () => {
it.effect("survives entity restart", () => Effect.gen(function*() {
const orderClient = yield* Order.client
const order = orderClient("order-123")
// Create order
yield* order.Create({
customerId: "cust-1",
items: [{ sku: "WIDGET", quantity: 2 }]
})
// Charge payment
yield* order.ChargePayment(99.99)
// Verify state before restart
const stateBefore = yield* order.GetState()
expect(stateBefore.status).toBe("paid")
expect(stateBefore.paymentId).toBeDefined()
// Simulate catastrophic failure - entity dies
yield* Sharding.terminateEntity(EntityAddress.make("Order", "order-123"))
// Entity restarts, loads from DB
const stateAfter = yield* order.GetState()
// State survived!
expect(stateAfter.status).toBe("paid")
expect(stateAfter.paymentId).toBe(stateBefore.paymentId)
}))
it.effect("handles duplicate charge attempts (idempotency)", () => Effect.gen(function*() {
const orderClient = yield* Order.client
const order = orderClient("order-456")
yield* order.Create({ customerId: "cust-2", items: [] })
// First charge succeeds
const result1 = yield* order.ChargePayment(50.00)
expect(result1.alreadyCharged).toBeUndefined()
// Simulate crash mid-response (message will be redelivered)
yield* Sharding.terminateEntity(EntityAddress.make("Order", "order-456"))
// Second charge attempt (retry after crash)
const result2 = yield* order.ChargePayment(50.00)
// Idempotency: returns cached result, doesn't double-charge
expect(result2.alreadyCharged).toBe(true)
expect(result2.paymentId).toBe(result1.paymentId)
}))
})Store all messages as events, replay from beginning to rebuild state:
const Account = Entity.make("Account", [
Rpc.make("Deposit", { payload: Schema.Number, success: Schema.Number }),
Rpc.make("Withdraw", { payload: Schema.Number, success: Schema.Number }),
Rpc.make("GetBalance", { success: Schema.Number })
])
const AccountLive = Account.toLayer(Effect.gen(function*() {
const eventStore = yield* EventStore
const entityId = yield* Entity.currentId
// Replay all historical events to rebuild state
const events = yield* eventStore.getEvents(entityId)
let balance = events.reduce((acc, event) => {
switch (event.type) {
case 'Deposited': return acc + event.amount
case 'Withdrawn': return acc - event.amount
default: return acc
}
}, 0)
return {
Deposit: ({ payload: amount }) => Effect.gen(function*() {
yield* eventStore.append(entityId, { type: 'Deposited', amount })
balance += amount
return balance
}),
Withdraw: ({ payload: amount }) => Effect.gen(function*() {
if (balance < amount) {
return yield* Effect.fail(new InsufficientFunds({ balance, requested: amount }))
}
yield* eventStore.append(entityId, { type: 'Withdrawn', amount })
balance -= amount
return balance
}),
GetBalance: () => Effect.succeed(balance)
}
}))Testing event sourcing:
describe("Account event sourcing", () => {
it.effect("rebuilds state from event history", () => Effect.gen(function*() {
const accountClient = yield* Account.client
const account = accountClient("acct-789")
// Perform operations
yield* account.Deposit(100)
yield* account.Withdraw(30)
yield* account.Deposit(50)
const balanceBefore = yield* account.GetBalance()
expect(balanceBefore).toBe(120) // 100 - 30 + 50
// Kill entity
yield* Sharding.terminateEntity(EntityAddress.make("Account", "acct-789"))
// New entity replays events
const balanceAfter = yield* account.GetBalance()
expect(balanceAfter).toBe(120) // Same! Rebuilt from events
}))
it.effect("event history is append-only", () => Effect.gen(function*() {
const eventStore = yield* EventStore
const accountClient = yield* Account.client
const account = accountClient("acct-audit")
yield* account.Deposit(100)
yield* account.Withdraw(25)
// Verify audit trail
const events = yield* eventStore.getEvents("acct-audit")
expect(events).toEqual([
{ type: 'Deposited', amount: 100 },
{ type: 'Withdrawn', amount: 25 }
])
}))
})Important nuance: application errors don't kill entities.
ChargePayment: ({ payload: amount }) => Effect.gen(function*() {
const result = yield* Stripe.charge(order!.customerId, amount).pipe(
Effect.catchTag("PaymentDeclined", (err) =>
// Error caught - entity keeps running!
// State unchanged, error returned to caller
Effect.fail(new PaymentFailed({ reason: err.message }))
)
)
// Only reach here on success
order!.paymentId = result.id
return result
})Entities only truly die when:
- Runner process crashes (OOM, kill -9, hardware failure)
- Shard reassigned (runner leaves cluster)
- Explicit termination (testing, shutdown)
Testing error handling (entity survives):
it.effect("entity survives payment failure", () => Effect.gen(function*() {
const orderClient = yield* Order.client
const order = orderClient("order-error-test")
yield* order.Create({ customerId: "bad-card-customer", items: [] })
// Payment fails
const result = yield* order.ChargePayment(100).pipe(
Effect.either
)
expect(Either.isLeft(result)).toBe(true)
// Entity still alive, state intact
const state = yield* order.GetState()
expect(state.status).toBe("created") // Not changed
expect(state.paymentId).toBeUndefined() // No payment recorded
// Can retry with different amount or fix card
// Entity didn't restart, in-memory state preserved
}))| Scenario | Temporal | Effect Cluster |
|---|---|---|
| Worker dies mid-activity | Activity retries automatically | Message retries with MessageStorage |
| Worker dies between steps | Workflow resumes at last completed step | Entity restarts, must load state from DB |
| Application error | Workflow handles typed error | Entity handles typed error, keeps running |
| State persistence | Automatic via event history | Manual via DB, event store, or snapshots |
| Replay mechanism | Re-run workflow, skip completed activities | Re-process pending messages OR replay events |
describe("Entity Durability Checklist", () => {
// 1. State survives restart
it.effect("persists state to storage")
it.effect("loads state on startup")
it.effect("survives entity termination")
// 2. Operations are idempotent
it.effect("handles duplicate requests safely")
it.effect("uses idempotency keys for external calls")
// 3. Error handling doesn't corrupt state
it.effect("rolls back on failure")
it.effect("entity survives application errors")
// 4. Concurrent access is safe
it.effect("handles concurrent messages correctly")
it.effect("maintains consistency under load")
})| Pattern | When to Use | State Location | Complexity |
|---|---|---|---|
| Volatile | Caches, rate limiters | Memory only | Simple |
| External DB | Business entities | Database | Medium |
| Event Sourcing | Audit trails, CQRS | Event store | Complex |
The key insight from Erlang applies here:
Entities are ephemeral. Design for recovery. Keep important state in durable storage.