Replay2024 - Temporal Notes

Replay2024 conference held in Seattle on September 19th and 20th, 2024.

From the Keynote
Temporal Cloud
Key Updates Announced during the Conference
Best Practices
Strategy
Advanced Techniques / Ideas
Deployment & Versioning
Additional Insights
References

From the keynote

Temporal’s goal in the organization is to allow engineers to sleep peacefully and have their time while on-call.
System failures lead to chaos:
- On-call issues slow down the speed of releasing new features.
- Stress levels among engineers are high, with 27% changing jobs due to on-call stress.
- With Temporal, no additional code is needed for recovery.

Temporal Cloud

Analyze why many self-hosted companies are transitioning to Temporal Cloud. Moving to Temporal Cloud is also part of Temporal’s philosophy, which is to allow engineers to sleep peacefully. Offering pain-free upgrades, improved monitoring, and optimized workflows. Engineers can focus on innovation rather than firefighting.

Key Updates Announced during the conference

Workflow update performance: Improved speed from 50ms to 30ms in p90.
Worker Auto-tune: Now automatically defines memory and CPU to avoid crashes, using Kubernetes metrics for horizontal pod scaling.
Temporal Cloud: Migration to GPC (Google Cloud Platform).
- Note: Ensure SSL setup.
Temporal Nexus: Enables communication between multiple namespaces, each with its own rules.

Public Preview of Nexus: Durable RPC – allowing namespaces to communicate with each other.
Temporal Nexus Architecture

Motivations about nexus:

Best Practices

Signals: Avoid overuse of signals as it can be difficult to trace status of a workflow.
Child Workflows: Consider using child workflows to paralellize tasks.
Workflow Continuity: Design workflows to continue-as-new, ensuring they upgrade smoothly when replay tests succeed.
Ideas for SDLC Changes

Strategy

In case a quick response is needed, consider having two workflows: one for the critical part that provides the response, and another that can continue handling non-critical operations at a slower pace

Advanced Techniques / Ideas

Parent workflows: Implement techniques to drain queues of child workflows efficiently.
- Before: Every click created a new parallel workflow, leading to multiple parallel operations.
- Now: Signals queue a child workflow but wait for the previous one to finish.
Scheduled Workflows: Schedule executions when timing is not critical. Avoiding times when we are under heavy traffic.
Compensation Workflows:
Update with start (something new to replace the use of child workflows):

Deployment & Versioning

Pinning Workflows: Introduce new workflow versions (e.g., V2 / V3).
Rainbow Deployments: Deploy multiple pods and move traffic dynamically for optimized load balancing.

Additional Insights

Workflow ID Naming: Java-based workflows have a naming limit of 256 characters.
On self-hosted: Check SSL Certificates expiration.
Implement HPA with Kubernetes
Info about Breaking Changes

References:

All screenshots were taken during the conference and are not my own. I highly recommend reviewing the talks once they are published and congratulating the engineers behind them

jonymusky/replayconf-2024.md