Skip to content

Instantly share code, notes, and snippets.

@rponte
Last active January 2, 2026 19:51
Show Gist options
  • Select an option

  • Save rponte/8489a7acf95a3ba61b6d012fd5b90ed3 to your computer and use it in GitHub Desktop.

Select an option

Save rponte/8489a7acf95a3ba61b6d012fd5b90ed3 to your computer and use it in GitHub Desktop.
THEORY: Little's Law and Applying Back Pressure When Overloaded

Applying Back Pressure When Overloaded

[...]

Let’s assume we have asynchronous transaction services fronted by an input and output queues, or similar FIFO structures. If we want the system to meet a response time quality-of-service (QOS) guarantee, then we need to consider the three following variables:

  1. The time taken for individual transactions on a thread
  2. The number of threads in a pool that can execute transactions in parallel
  3. The length of the input queue to set the maximum acceptable latency
max latency  = (transaction time / number of threads) * queue length
queue length = max latency / (transaction time / number of threads)

By allowing the queue to be unbounded the latency will continue to increase. So if we want to set a maximum response time then we need to limit the queue length.

By bounding the input queue we block the thread receiving network packets which will apply back pressure up stream. If the network protocol is TCP, similar back pressure is applied via the filling of network buffers, on the sender. This process can repeat all the way back via the gateway to the customer. For each service we need to configure the queues so that they do their part in achieving the required quality-of-service for the end-to-end customer experience.

One of the biggest wins I often find is to improve the time taken to process individual transaction latency. This helps in the best and worst case scenarios.

[...]

@rponte
Copy link
Author

rponte commented Dec 8, 2025

Twitter thread about Cascading Failure and Backpressure Fundamentals

I liked this thread because it talks about 2 (two) useful patterns on the consumer side to handle backpressuring.

image

However, it didn't discuss system overloading in general. But, on another tweet, he gives an excellent perspective on the (main) trade-off that drives the choice between back-pressure (blocking on input) and load shedding (dropping data on the floor):

When a service is overwhelmed, we have exactly two choices: preserve data or preserve latency.

We either build backpressure to queue and risk collapse, or we shed load and risk data loss.

Which side of this trade-off does your system live on, and can you defend why?

@rponte
Copy link
Author

rponte commented Dec 8, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment