Tested processor performance in the following setup:
Sender sends up to 4k messages per second in batches. Message size is 12 bytes.
Messages are received by consumer (via Spring Boot binder), which
- forwards the payload of each message to another queue in the same namespace.
- then completes the message.
Consumer uses two different connections for queue1
and queue2
.
We also have a receiver for the queue2
so it does not overflow and we simulate more load on SB instance.
We're going to monitor the consumer:
- throughput: number of received (forwarded and completed) messages
- CPU utilization %, normalized per core (i.e. 50% with 2 core is comparable to 100% on 1 core)
- max memory usage
In all scenarios:
- we have more messages sent than consumer is able to process.
- ServiceBus namespace instance is not overloaded (CPU/memory remains under 80%).
- sender, consumer and receiver run in different containers in k8s pod.
@Autowired
private ServiceBusSenderClient senderClient;
@Bean
public Consumer<Message<String>> consume() {
return message -> {
Checkpointer checkpointer = (Checkpointer) message.getHeaders().get(CHECKPOINTER);
ServiceBusReceivedMessageContext messageContext = (ServiceBusReceivedMessageContext) message.getHeaders().get("azure_service_bus_received_message_context");
ServiceBusReceivedMessage msg = messageContext.getMessage();
try {
checkMessage(messageContext.getMessage());
senderClient.sendMessage(new ServiceBusMessage(message.getPayload()));
checkpoint(checkpointer.success(), msg).block();
} catch (Exception ex) {
checkpoint(checkpointer.failure(), msg).block();
throw ex;
}
};
}
Tried other variations (no noticeable effect on perf):
- using topic + 1 subscription
- using producer binder
spring.cloud.azure.servicebus.connection-string=${SERVICEBUS_CONNECTION_STRING}
spring.cloud.azure.servicebus.namespace=${SERVICEBUS_NAMESPACE}
spring.cloud.azure.servicebus.entity-type=queue
spring.cloud.azure.servicebus.producer.entity-name=${FORWARD_QUEUE_NAME}
spring.cloud.stream.bindings.consume-in-0.destination=${SERVICEBUS_QUEUE_NAME}
spring.cloud.stream.servicebus.bindings.supply-out-0.producer.entity-type=queue
spring.cloud.stream.servicebus.bindings.consume-in-0.consumer.auto-complete=false
spring.cloud.stream.servicebus.bindings.consume-in-0.consumer.entity-type=queue
spring.cloud.stream.servicebus.bindings.consume-in-0.consumer.prefetch-count=0
spring.cloud.stream.servicebus.bindings.consume-in-0.consumer.max-concurrent-calls=${PROCESSOR_CONCURRENCY}
spring.cloud.function.definition=consume;
spring.cloud.azure.retry.mode=exponential
spring.cloud.azure.retry.exponential.max-retries=5
spring.cloud.azure.retry.exponential.base-delay=PT2S
spring.cloud.azure.retry.exponential.max-delay=PT60S
spring.cloud.azure.resource-manager.enabled=false
resources:
requests:
memory: "2Gi" # parameter
cpu: "2" # parameter
limits:
memory: "2Gi" # parameter
cpu: "2" # parameter
azure-messaging-servicebus
version:7.15.0-beta.4
or7.15.0-beta.5
- processor concurrency: from 8 to 256
- reactor thread pool size (
-Dreactor.schedulers.defaultBoundedElasticSize
): default (20 threads), up to 260 threads - CPU requests/limits: 1 or 2
- Memory requests/limits: 2Gi or 6Gi
Tested with 7.15.0-beta.5, 2 cores, 2Gi.
Version | Concurrency | Threads | Throughput, messages per second | Max CPU per core % | Max memory usage, MB |
---|---|---|---|---|---|
7.15.beta.5 | 8 | 20 | ~120 | ~12% | ~450 MB |
7.15.beta.5 | 96 | 100 | ~1350 | ~55% | ~900 MB |
7.15.beta.5 | 128 | 140 | ~1650 | ~57% | ~900 MB |
7.15.beta.5 | 256 | 260 | ~2400 | ~87% | ~1200 MB |
Throughtput grows with concurrency and resource utilization while resources are available.
Version | Concurrency | Threads | Throughput, messages per second | Max CPU per core % | Max memory usage, MB |
---|---|---|---|---|---|
7.15.beta.4 | 256 | 260 | ~1350 | ~97% | ~2000 MB |
7.15.beta.5 | 256 | 260 | ~2700 | ~90% | ~2200 MB |
7.15.beta.5 version has 2x throughput improvement comparing to 7.15.beta.4.
Version | Concurrency | Threads | Throughput, messages per second | Max CPU per core % | Max memory usage, MB |
---|---|---|---|---|---|
7.15.beta.4 | 256 | 260 | ~480 | ~95% | ~1100 MB |
7.15.beta.5 | 256 | 260 | ~2400 | ~87% | ~1200 MB |
With 2 Gi memory limit, beta.5 shows 5x improvement over beta.4.
Tested with 2 cores, 6Gi memory limit
Version | Concurrency | Threads | Throughput, messages per second | Max CPU per core % | Max memory usage, MB |
---|---|---|---|---|---|
7.15.beta.5 | 96 | 20 | ~250 | ~12% | ~800 MB |
7.15.beta.5 | 96 | 100 | ~1350 | ~56% | ~1000 MB |
Configuring reactor elastic thread pool size (reactor.schedulers.defaultBoundedElasticSize
) to slightly exceed concurrency results in corresponding throughput increase and better resource utilization.
Concurrency should be increased along with the thread pool size.
Tested with 2 Gi memory request/limit
Version | Cores | Concurrency | Throughput, messages per second | Max CPU per core % | Max memory usage, MB |
---|---|---|---|---|---|
7.15.beta.5 | 1 | 128 | ~1350 | ~90% | ~700 MB |
7.15.beta.5 | 2 | 128 | ~1800 | ~60% | ~900 MB |
Tested with 2 cores request/limit
Version | Memory | Concurrency | Throughput, messages per second | Max CPU per core % | Max memory usage, MB |
---|---|---|---|---|---|
7.15.beta.5 | 2Gi | 256 | ~2400 | ~90% | ~1000 MB |
7.15.beta.5 | 6Gi | 256 | ~2700 | ~90% | ~2200 MB |