rponte/littles-law-and-back-pressure.md

Last active June 20, 2025 20:05

Star (2) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/rponte/8489a7acf95a3ba61b6d012fd5b90ed3.js"></script>
Save rponte/8489a7acf95a3ba61b6d012fd5b90ed3 to your computer and use it in GitHub Desktop.

Download ZIP

THEORY: Little's Law and Applying Back Pressure When Overloaded

Raw

littles-law-and-back-pressure.md

Applying Back Pressure When Overloaded

[...]

Let’s assume we have asynchronous transaction services fronted by an input and output queues, or similar FIFO structures. If we want the system to meet a response time quality-of-service (QOS) guarantee, then we need to consider the three following variables:

The time taken for individual transactions on a thread
The number of threads in a pool that can execute transactions in parallel
The length of the input queue to set the maximum acceptable latency

max latency  = (transaction time / number of threads) * queue length
queue length = max latency / (transaction time / number of threads)

By allowing the queue to be unbounded the latency will continue to increase. So if we want to set a maximum response time then we need to limit the queue length.

By bounding the input queue we block the thread receiving network packets which will apply back pressure up stream. If the network protocol is TCP, similar back pressure is applied via the filling of network buffers, on the sender. This process can repeat all the way back via the gateway to the customer. For each service we need to configure the queues so that they do their part in achieving the required quality-of-service for the end-to-end customer experience.

One of the biggest wins I often find is to improve the time taken to process individual transaction latency. This helps in the best and worst case scenarios.

[...]

Author

rponte commented Dec 31, 2024 •

edited

Loading

⭐️ I'm not feeling the async pressure
- So why is back pressure all the sudden a topic to discuss when we wrote thread based software for years and it did not seem to come up? A combination of many factors some of which are just the easy to shoot yourself into the foot.

##
# Service-side using semaphores to implement some backpressure (queueing) along with 
# the service's API exposing its actual state
##
from hypothetical_asyncio.sync import Semaphore, Service

semaphore = Semaphore(200)

class RequestHandlerService(Service):
    async def handle(self, request):
        await semaphore.acquire()
        try:
            return generate_response(request)
        finally:
            semaphore.release()

    @property
    def is_ready(self):
        return semaphore.tokens_available()

##
# Caller-side evaluating if the service is overloaded so that it can give up earlier instead of waiting and
# piling up calls infinitely
##
request_handler = RequestHandlerService()
if not request_handler.is_ready:
    response = Response(status_code=503)
else:
    response = await request_handler.handle(request)

Backpressure explained — the resisted flow of data through software

Author

rponte commented Jan 2, 2025 •

edited

Loading

⭐️ Articles about handling overloaded systems

These are the best articles about handling overloaded systems and back-pressure mechanisms I have read. Fred Hebert wrote most of them or referenced many in his articles.

⭐️ Queues Don't Fix Overload - by Fred Hebert
⭐️ Handling Overload - by Fred Hebert
⭐️ YouTube: Planning for Overload - by Fred Hebert
- Slides of Planning for Overload talk
Local Optimizations Don't Lead to Global Optimums
- One of the well-known mechanisms is the Efficiency–thoroughness trade-off (ETTO) principle, which states that since time and resources are limited, one has to trade-off efficiency and thoroughness to accomplish a task. Basically, if there's more work to do than there's capacity to do it, either you maintain thoroughness and the work accumulates or gets dropped, or you do work less thoroughly, possibly cut corners, accuracy, or you have to be less careful and keep going as fast as required.
The Law of Stretched [Cognitive] Systems - by Fred Hebert
Robust Yet Fragile Complexity, or Scale Free Network?
- The Internet has two kinds of feedback. It maintains a constantly updated picture of the entire network so that messages can be directed along the fastest routes. It also breaks down those messages and encapsulates them inside standardized packets of data [...]. Each packet can take its own path through the Internet. As packets arrive at the recipient’s computer, the message fragments in each packet are extracted and reassembled. Critically, as each packet arrives, it sends back a receipt to the sender’s computer. In heavy traffic, some packets get lost. In response to lost packets, computers slow down the rate at which they send their data, reducing congestion.
  
  Together, these two types of feedback give the Internet a robustness more powerful than anyone anticipated.

Author

rponte commented Jan 3, 2025

Amazon Builders' Library

Some good articles from Amazon Builders' Library about handling overloaded systems and back-pressure mechanisms.

Author

rponte commented Jan 8, 2025 •

edited

Loading

⭐️ Retry strategies and their impact on overloaded systems

⭐️⭐️ Good Retry, Bad Retry: An Incident Story

This article is gold! It shows how some retry techniques might overload a system through a DIDACTIC and well-written story. It covers techniques such as:

Simple retry;
Retry with backoff;
Retry with backoff and jitter;
Retry circuit breaker: The service client completely disables retries if the percentage of service errors exceeds a certain threshold (for example, 10%). As soon as the percentage of errors within an arbitrary minute drops below the threshold, retries are resumed. If the service experiences problems, it won’t receive any additional load from retries;
Retry budget (or adaptive retry): Retries are always allowed, but within a budget, for example, no more than 10% of the number of successful requests. In case of service problems, it can receive no more than 10% of additional traffic;
Retry + Circuit breaker(threshold=10%);
Retry + Circuit breaker(threshold=50%);
Retry + Deadline propagation;

Both (Retry circuit breaker and Retry budget) options guarantee that in case of service problems, clients will add no more than n% of additional load to it

[...] it’s necessary to differentiate between scenarios when the service is healthy and when it’s experiencing problems. If the service is healthy, it can be retried because errors might be transient. If the service is having issues, retries should be stopped or minimized.

The percentage of retries can be calculated locally without complicating the system with global statistics synchronization.

Ben conducted a simulation: for long-lived clients, local statistics behave identically to global ones, and exponential backoff doesn’t significantly impact amplification.

Based on these findings, Ben decided to propose a new postmortem action item: implementing a retry budget with a 10% limit, in addition to the existing exponential backoff. There’s no need for global statistics synchronization — a local token bucket should be enough.

References that it's worth to read it

Annotations (pt_BR)

Esse artigo eh PERFEITO, gesuis! 🤩🤩🤩
https://medium.com/yandex/good-retry-bad-retry-an-incident-story-648072d3cee6

O artigo eh sobre como retries podem sobrecarregar seu sistema e como lidar com isso.

Resumo do resumo:

Retries são perigosos, isso já sabemos. Mas como estratégias de retry impactam negativamente na sobrecarga do sistema eh onde fica interessante.

O artigo testa algumas estrategias de retry em alguns cenarios através de simulações. Mas como eh pra resumir o artigo que eh longo, vamos lá...

A estratégia de Retry+backoff+jitter funciona muito bem para sistemas que sejam considerados saudáveis (healthy), ou seja, que estão enfrentando uma sobrecarga temporaria, indisponibilidade parcial, mas principalmente curta, ou seja, que causa transient errors, mas ela não é de muita ajuda em sobrecargas longas (particionamento de rede, crash da aplicação ou alta taxa de erro), pois ela apenas posterga a sobrecarga da aplicação, aumentando o tempo de recovery da aplicação. De forma direta, podemos inferir que, se o tempo de sobrecarga for superior ao tempo que os clients (que fazem retry) estão dispostos a esperar, então os retries estão apenas piorando a situação!

Em contrapartida, Retry adaptativo (Retry Token Bucket) ou Retry Circuit-Breaker (o breaker é a nivel de retry, e não complemento a ele) funcionam para para sobrecargas ou indisponibilidades longas do sistema, e também para curtas - embora com menor taxa de sucesso para sobrecarga curta comparada ao backoff+jitter. Ambas as estratégias, em caso de sobrecarga longa, conseguem diminuir BASTANTE a carga da aplicação, para um percentual baixo da carga original, permitindo a aplicação se recupear mais rapido, que é justamente o que se quer em casos de indisponibilidade.

Outro ponto, é que Retry+backoff+jitter funciona muito bem para mitigação (diminuição ou eliminação) da sobrecarga do sistema em cenários mais estáveis (geralmente closed system), ou seja, cenários com long-lived clients ou número de clients limitados e/ou com execução serial/sequencial das requisições, como por exemplo, jobs em background fazendo polling no sistema ou numa fila. Enquanto as estratégias de Retry Token Bucket e Retry Circuit-Breaker, são ideais para cenários onde não há controle no número de clients (unbounded clients), por exemplo, bordas do sistema onde não se tem controle dos usuários ou dos clients - aqui, o importante é estar ciente que nesse tipo de cenário (geralmente open system), sempre haverá novos clients enviando novas requisições ("first try" - o primeiro request) independente se já existem outros usuários (ou threads) fazendo backoff nesse meio tempo.

O autor conseguiu combinar muito bem os vários artigos de resiliência do Marc Brooker e usar o simulador dele para validar as hipoteses! Ficou simplesmente ANIMAL!

(Eu acompanho o Marc, mas confesso que tive que reler os artigos do Marc para relembrar e conectar melhor os pontos - e gesuis, eh animal demais!)

Author

rponte commented Jan 8, 2025 •

edited

Loading

⭐️ Google SRE Book: Handling Overload
- In a majority of cases (although certainly not in all), we've found that simply using CPU consumption as the signal for provisioning works well, for the following reasons:
  - In platforms with garbage collection, memory pressure naturally translates into increased CPU consumption.
  - In other platforms, it's possible to provision the remaining resources in such a way that they're very unlikely to run out before CPU runs out.
- Our larger services tend to be deep stacks of systems, which may in turn have dependencies on each other. In this architecture, requests should only be retried at the layer immediately above the layer that is rejecting them. When we decide that a given request can't be served and shouldn't be retried, we use an "overloaded; don't retry" error and thus avoid a combinatorial retry explosion.
⭐️ Google SRE Book: Addressing Cascading Failures
- A cascading failure is a failure that grows over time as a result of positive feedback.
- Limit retries per request. Don’t retry a given request indefinitely.
- Consider having a server-wide retry budget. For example, only allow 60 retries per minute in a process, and if the retry budget is exceeded, don’t retry; just fail the request. [...]
- Think about the service holistically and decide if you really need to perform retries at a given level. In particular, avoid amplifying retries by issuing retries at multiple levels: [...]
- Use clear response codes and consider how different failure modes should be handled. For example, separate retriable and nonretriable error conditions. Don’t retry permanent errors or malformed requests in a client, because neither will ever succeed. Return a specific status when overloaded so that clients and other layers back off and do not retry.
- If handling a request is performed over multiple stages (e.g., there are a few callbacks and RPC calls), the server should check the deadline left at each stage before attempting to perform any more work on the request. For example, if a request is split into parsing, backend request, and processing stages, it may make sense to check that there is enough time left to handle the request before each stage.

Author

rponte commented Jan 8, 2025 •

edited

Loading

A tal da "Fila virtual"

⭐️ SeatGeek Case: Build a Virtual Waiting Room with Amazon DynamoDB and AWS Lambda at SeatGeek

Author

rponte commented Mar 21, 2025

Queuing, Backpressure, Single Writer and other useful patterns for managing concurrency

Author

rponte commented Mar 21, 2025 •

edited

Loading

Youtube | ScyllaDB: Resilient Design Using Queue Theory: This talk discusses backpressure, load shedding, and how to optimize latency and throughput.

Author

rponte commented Mar 21, 2025

⭐️ SREcon24 Americas - System Performance and Queuing Theory - Concepts and Application

rafaelpontezup commented Jun 20, 2025

The #1 rule of scalable systems is to avoid congestion collapse - by @jamesacowling
https://x.com/jamesacowling/status/1934991944234770461

A good metaphor for congestion collapse is to imagine you're a barista at a coffee shop that just got popular. The cashier keeps taking orders and stacking them up higher and higher but you can't make coffees any faster. [...] - by @jamesacowling
https://x.com/jamesacowling/status/1935812480254787819

rponte/littles-law-and-back-pressure.md

rponte commented Dec 31, 2024 •

edited

Loading

Uh oh!

rponte commented Jan 2, 2025 •

edited

Loading

Uh oh!

rponte commented Jan 3, 2025

Uh oh!

rponte commented Jan 8, 2025 •

edited

Loading

Uh oh!

rponte commented Jan 8, 2025 •

edited

Loading

Uh oh!

rponte commented Jan 8, 2025 •

edited

Loading

Uh oh!

rponte commented Mar 21, 2025

Uh oh!

rponte commented Mar 21, 2025 •

edited

Loading

Uh oh!

rponte commented Mar 21, 2025

Uh oh!

rafaelpontezup commented Jun 20, 2025

Uh oh!

rponte/littles-law-and-back-pressure.md

rponte commented Dec 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Jan 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⭐️ Articles about handling overloaded systems

Uh oh!

rponte commented Jan 3, 2025

Amazon Builders' Library

Uh oh!

rponte commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⭐️ Retry strategies and their impact on overloaded systems

References that it's worth to read it

Annotations (pt_BR)

Uh oh!

rponte commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

A tal da "Fila virtual"

Uh oh!

rponte commented Mar 21, 2025

Uh oh!

rponte commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rponte commented Mar 21, 2025

Uh oh!

rafaelpontezup commented Jun 20, 2025

Uh oh!

rponte commented Dec 31, 2024 •

edited

Loading

rponte commented Jan 2, 2025 •

edited

Loading

rponte commented Jan 8, 2025 •

edited

Loading

rponte commented Jan 8, 2025 •

edited

Loading

rponte commented Jan 8, 2025 •

edited

Loading

rponte commented Mar 21, 2025 •

edited

Loading