A method for orchestrating distributed Kafka benchmarks using Ansible

Kafka benchmarks are typically run using a single producer and consumer against a single topic, and the producer and consumer are run at close to maximum write/read speeds. In the real world, a Kafka cluster is more often serving many lower throughput producers and consumers. Ansible allows for a benchmarking method that sets up any number of topics and many producers and consumers.

Ansible playbooks allow us to run a number of tasks against a distributed set of clients both synchronously and asynchronously.

Topic setup

Before we can run tests we need topics to test against. This play sets up a number of topics with various partition configurations:

- name : Setup
  hosts: tools
  tags:
    - setup
  vars:
    bootstrap: 172.20.10.101:9092
    topicname_prefix: perftest
    replicas: 3
    retention: 300000 #5 mins
    topics_partitions: [5, 10, 20, 30, 50]
    topic_count: "{{ range(1, 30 + 1) | list }}" # create 30 topics for each 
  tasks:
    - name: create topics
      shell: 
        cmd: >-
            kafka-topics --bootstrap-server {{ bootstrap }} --list
      register: topiclist
    - name: create topic if it doesn't exist 
      shell:
        cmd: >-
          kafka-topics --bootstrap-server {{ bootstrap }}
          --topic {{ topicname_prefix }}-{{ item.0 }}-{{ item.1 }}
          --create --partitions {{ item.0 }}
          --replication-factor {{ replicas }}
          --config retention.ms={{ retention }}
          --config min.insync.replicas=2
      loop: "{{ topics_partitions|product(topic_count)|list }}"
      when: "'-'.join((topicname_prefix, item.0|string, item.1|string)) not in topiclist.stdout"

Some notes on the above play:

This play runs on the tools host, it’s a single host defined in the inventory, so these commands will only be run once
This is an idempotent play, meaning it can be run multiple times with the same results, it will check for the existence of topics and won’t attempt to recreate them.
The loop item uses the product filter in Ansible to create a cartesian product of the topic partitions and topic counts: [[5, 1], [5, 2], [5, 3] . . . [50, 29], [50, 30]]
We use this loop to create 30 topics with 5 partitions, 30 topics with 10 partitions and so on
On line 28 the when is a conditional that checks for membership of the topic in the topiclist we registered on line 17

Running Producers

Now that we have topics, let’s produce some data to them:

- name: Tests
  strategy: free
  hosts:
    - kafka_connect
  gather_facts: no
  vars:
    bootstrap: 172.20.10.101:9092
    topicname_prefix: perftest
    throughput: 5000
    record_size: 512
    num_records: 10000000
    num_producers: 10
    acks: all
    topics_partitions: [5, 10, 20, 30, 50]
    topic_count: "{{ range(1, 30 + 1) | list }}" #
    topics: []
  tasks:
  - name: set list of topics
    set_fact: 
      topics: "{{topics + [ topicname_prefix + '-' + item.0|string + '-' + item.1|string ]}}"
    loop: "{{ topics_partitions|product(topic_count)|list }}"
  - name: producer test
    async: 5184000
    poll: 0
    register: producer_output
    command: >-
      kafka-producer-perf-test
      --topic {{ topics | random }}
      --producer-props bootstrap.servers={{ bootstrap }} 
            acks={{ acks }}
            client.id={{ inventory_hostname }}-producer-{{ item }}
            linger.ms=10
      --record-size {{ record_size }}
      --throughput {{ throughput }}
      --num-records {{ num_records}}
    loop: "{{ range(1, num_producers + 1) | list }}"

Notes on this play:

The execution strategy is free, which means that Ansible won’t wait for hosts to finish executing, and will run them as fast as possible (line 2)
We’re creating the list of topics in the set_fact method using the same cartesian product approach
The producer command uses the random filter to select a topic at random from the list
The producer test will be run on each host in the specified group, in this case, kafka_connect, using the loop method on line 37 we will create a number of producers equal to num_producers on each host
Producer throughput will be determined by the throughput variable and record_size, in this case 512 bytes * 5000 records per second is about 2.5MB/s. There were 20 kafka_connect hosts and the num_producers per host is set to 10, so this producer test will try and write ~500MB/s to Kafka.
This play doesn’t wait for the producers to complete (see the poll: 0 entry in Ansible docs), it simply launches them and moves on, we are monitoring the results via Control Center, and JMX with Prometheus and Grafana. This play could be modified to run all producers asynchronously and poll for when they’re finished to collect any output back to the tools host.

cjmatta/ansible-benchmarks.md

A method for orchestrating distributed Kafka benchmarks using Ansible

Topic setup

Running Producers