Applying Back Pressure When Overloaded
[...]
Let’s assume we have asynchronous transaction services fronted by an input and output queues, or similar FIFO structures. If we want the system to meet a response time quality-of-service (QOS) guarantee, then we need to consider the three following variables:
- The time taken for individual transactions on a thread
- The number of threads in a pool that can execute transactions in parallel
- The length of the input queue to set the maximum acceptable latency
max latency = (transaction time / number of threads) * queue length
queue length = max latency / (transaction time / number of threads)
By allowing the queue to be unbounded the latency will continue to increase. So if we want to set a maximum response time then we need to limit the queue length.
By bounding the input queue we block the thread receiving network packets which will apply back pressure up stream. If the network protocol is TCP, similar back pressure is applied via the filling of network buffers, on the sender. This process can repeat all the way back via the gateway to the customer. For each service we need to configure the queues so that they do their part in achieving the required quality-of-service for the end-to-end customer experience.
One of the biggest wins I often find is to improve the time taken to process individual transaction latency. This helps in the best and worst case scenarios.
[...]
https://www.infoq.com/articles/Java-Thread-Pool-Performance-Tuning