-
-
Save idris75/785c5fe7c635fdbe9087084537607ef3 to your computer and use it in GitHub Desktop.
kafka producer - consumer - broker tuning
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1.Producer | |
1.request.required.acks=[0,1,all/-1] 0 no acknowledgement but ver fast, 1 acknowledged after leader commits, all acknowledged after replicated | |
2.use Async producer - use callback for the acknowledgement, using property producer.type=1 | |
3.Batching data - send multiple messages together. | |
batch.num.messages | |
queue.buffer.max.ms | |
4.Compression for Large files - gzip, snappy supported | |
very large files can be stored in shared location and just the file path can be logged by the kafka producer. | |
5.timeouts/retry settings - defaults may be longer, so change based on use case - using property request.timout.ms | |
2.Brokers: | |
1. choose high partitions - as cannot have more consumers than partitions. | |
2. try 1 partition per physical disc, to avoid IO bottleneck. | |
3. load balance partitions - use tool - | |
kafka-reassign-partitions.sh --generate (the plan) --execute --verify. | |
some parameters to tune: | |
num.io.thread - at least as many threads as the disks. | |
log.flush.interval - higher interval will increase the speed, but risk data loss of server crashes. | |
3.Consumers: | |
1.have as many consumers in a group as there are partitions. | |
2.keep up with the number of producers | |
3.adding more consumers to a group will enhance performance but not adding a consumer group. | |
4.checkpoint interval | |
replica.high.watermark.checkpoint.interval.ms - high value will increase performance, as checkpointing is done infrequently - slight data risk possibility | |
4.pipeline performance: | |
The other systems in the pipeline, for example data is written to HDFS, should also perform as good as Kafka. | |
extra: | |
kafka vs other messaging system: | |
1.Decoupling of producer & consumer - data is persisted when consumers go down, work periodically like ETL | |
2.Data is not stored per consumer like in a queue - data saved once - any number of consumer can independently read with their own offset values. | |
3.Replication is by default - not in just specialized cases with complicated configuration. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment