https://www.confluent.io/blog/monitor-kafka-clusters-with-prometheus-grafana-and-confluent/
This document is prepared to set up the monitoring stack using Ansible as much as possible, Prometheus and Grafana have been tested to work on an Air-Gap (no internet access) environment providing the binaries from the official site, using the playbooks from GitHub user (0x0I)[https://github.com/O1ahmad]. Prometheus Node exporter playbook, provided by (Clud Alchemy)[https://github.com/cloudalchemy] has not bee tested on an Air-Gap environment, it doesn't mean it couldn't work.
Prometheus scrape metrics from an http endpoint that need to be exposed on the target hosts. This endpoint is exposed by a jmxexporter javaagent that can be enabled using CP-Ansible, adding the following configuration to the inventory host file.
#### Monitoring Configuration ####
jmxexporter_enabled: true
jmxexporter_url_remote: false
jmxexporter_jar_url: ~/jmx/jmx_prometheus_javaagent-0.17.2.jar
The jmx-exporter jar file can be downloaded from Maven central
wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.17.2/jmx_prometheus_javaagent-0.17.2.jar
The configuration for the exporter is already provided by cp-ansible and also available in the jmx-monitoring-stacks, in shared-assets/jmx-exporter
Each service can be configured with a different port for the exporter endpoint, these are default values per component:
- Zookeepers 8079
- Brokers 8080
- SchemaRegistry 8078
- Connect 8077
- KSQL 8076
After running the CP-Ansible playbook you can test the endpoint using:
curl http://<component-host>:<component-port>
Use jmx-monitoring-stacks playbook (under jmxexporter-prometheus-grafana/cp-ansible
) to create the scrape targets information for Prometheus.
From the cp-ansible directory mentioned above... ansible-playbook -i inventory.yml prometheus-config.yml -e env=environment-name
ansible-playbook -i ~/inventories/sasl-rbac-env1.yml prometheus-config.yml -e env=primary -e node_exporter_enabled=true
ansible-playbook -i ~/inventories/sasl-rbac-env2.yml prometheus-config.yml -e env=dr -e node_exporter_enabled=true
The generated file in the example above are the scrape_config
for prometheus that must copied in to prometheus playbook configuration in the next step
The next playbooks for installing Prometheus and Grafana, have a dependency over a playbook to prepare systemd processes. This can be installed as:
ansible-galaxy role install 0x0i.systemd
Or for an Air Gap ansible host, use a two step process .
- Download and package the playbook on a host with access to GitHub
git clone https://github.com/0x0I/ansible-role-systemd
tar -czvf 0x0i.systemd ansible-role-systemd
- Copy the file
0x0i.systemd
to the ansible controller host and install the role
ansible-galaxy role install 0x0i.systemd
The binaries for Prometheus are available here: https://prometheus.io/download/#prometheus
The default execution of the playbook downloads the binary from the above url, for AirGap environments, it is assumed that this is downloaded to a ~/downloads
folder (change the inventory file below)
The playbooks for this step are provided by https://github.com/0x0I/ansible-role-prometheus
For the latest version of the playbook, download and package from GitHub
git clone https://github.com/0x0I/ansible-role-prometheus.git
tar -czvf 0x0i.prometheus ansible-role-prometheus
Install using ansible-galaxy
ansible-galaxy role install 0x0I.prometheus
The is a wrapper playbook to launch the install role
## File: install-prometheus.yml
---
- name: Installing Prometheus on hosted machine
hosts: prometheus
gather_facts: true
tasks:
- name: Create temp dir for binary
file:
path: "/tmp/prometheus"
state: directory
mode: "0666"
- name: Copy prometheus binary
copy:
src: "{{ prometheus_local_binary }}"
dest: "{{ prometheus_remote_binary }}"
mode: "0666"
- import_role:
name: 0x0i.prometheus
vars:
archive_url: "file://{{ prometheus_remote_binary }}"
archive_checksum: ''
This inventory file will contain the scrapping targets configuration prepared in "Prepare Scrapping targets for the environment" section
# prometheus-inventory.yml
---
all:
vars:
ansible_connection: ssh
ansible_user: dfederico
ansible_become: true
ansible_ssh_private_key_file: ~/.ssh/id_rsa
ansible_python_interpreter: /usr/bin/python3
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
prometheus_archive_name: prometheus-2.37.6.linux-amd64
prometheus_local_binary: "~/downloads/{{ prometheus_archive_name }}.tar.gz"
prometheus_remote_binary: "/tmp/prometheus/{{ prometheus_archive_name }}.tar.gz"
prometheus_config:
scrape_configs:
- job_name: "zookeeper"
static_configs:
- targets:
- "dfederico-demo-zk-0:8079"
labels:
env: "primary"
- targets:
- "dfederico-demo-dr-zk-0:8079"
labels:
env: "dr"
relabel_configs:
- source_labels: [__address__]
target_label: instance
regex: '([^:]+)(:[0-9]+)?'
replacement: '${1}'
- job_name: "kafka-broker"
static_configs:
- targets:
- "dfederico-demo-broker-0:8080"
- "dfederico-demo-broker-1:8080"
- "dfederico-demo-broker-2:8080"
labels:
env: "primary"
- targets:
- "dfederico-demo-dr-broker-0:8080"
- "dfederico-demo-dr-broker-1:8080"
- "dfederico-demo-dr-broker-2:8080"
labels:
env: "dr"
relabel_configs:
- source_labels: [__address__]
target_label: instance
regex: '([^:]+)(:[0-9]+)?'
replacement: '${1}'
- job_name: "schema-registry"
static_configs:
- targets:
- "dfederico-demo-sr-0:8078"
labels:
env: "primary"
- targets:
- "dfederico-demo-dr-sr-0:8078"
labels:
env: "dr"
relabel_configs:
- source_labels: [__address__]
target_label: instance
regex: '([^:]+)(:[0-9]+)?'
replacement: '${1}'
- job_name: "kafka-connect"
static_configs:
- targets:
- "dfederico-demo-connect-0:8077"
labels:
env: "primary"
- targets:
- "dfederico-demo-dr-connect-0:8077"
labels:
env: "dr"
relabel_configs:
- source_labels: [__address__]
target_label: instance
regex: '([^:]+)(:[0-9]+)?'
replacement: '${1}'
prometheus:
hosts:
dfederico-demo-extra-0:
The above examples set up two environment, the job names match the services as they will be used in grafana, and the different targets are labelled with an env
tag each.
ansible-playbook -i prometheus-inventory.yml install-prometheus.yml
On any or all host (or using ansible) check that node-exporter process is running
sudo systemctl status prometheus
ansible -i prometheus-inventory.yml prometheus -m shell -a "systemctl status prometheus.service"
You can check system.d event with journalctl
or similar
journalctl -f -u prometheus.service
Other commands:
systemctl cat prometheus.service
Default config file: /etc/prometheus/prometheus.yml
Query endpoint default /
or /status
endpoint (port 9090
)
Open a webbrowser to port 9090
to /targets
endpoint to confirm all the scraping works
The binaries for Grafana are available here: https://grafana.com/grafana/download/9.5.3?edition=oss
The default execution of the playbook downloads the binary from the above url, for AirGap environments, it is assumed that this is downloaded to a ~/downloads
folder (change the inventory file below)
The playbooks for this step are provided by https://github.com/0x0I/ansible-role-grafana
For the latest version of the playbook, download and package from GitHub
git clone https://github.com/0x0I/ansible-role-grafana
tar -czvf 0x0i.grafana ansible-role-grafana
Install using ansible-galaxy
ansible-galaxy role install 0x0i.grafana
The is a wrapper playbook to launch the install role
## File: install-grafana.yml
---
- name: Installing Grafana on hosted machine
hosts: grafana
gather_facts: true
tasks:
- name: Create temp dir for binary
file:
path: "/tmp/grafana"
state: directory
mode: "0666"
- name: Copy grafana binary
copy:
src: "{{ grafana_local_binary }}"
dest: "{{ grafana_remote_binary }}"
mode: "0666"
- import_role:
name: 0x0i.grafana
vars:
archive_url: "file://{{ grafana_remote_binary }}"
archive_checksum: ''
ansible-playbook -i ~/inventories/gcp-sandbox/sasl-rbac-env1.yml grafana-install.yml
(Optional) This can be merged with the prometheus-inventory.yml
# grafana-inventory.yml
---
all:
vars:
ansible_connection: ssh
ansible_user: dfederico
ansible_become: true
ansible_ssh_private_key_file: ~/.ssh/id_rsa
ansible_python_interpreter: /usr/bin/python3
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
grafana_archive_name: grafana-9.4.3.linux-amd64
grafana_local_binary: "~/downloads/{{ grafana_archive_name }}.tar.gz"
grafana_remote_binary: "/tmp/grafana/{{ grafana_archive_name }}.tar.gz"
grafana_config:
# section [security]
security:
admin_user: admin
admin_password: admin-secret
grafana:
hosts:
dfederico-demo-extra-0:
Note/change the admin user and password above that should be used to access the WebGUI
ansible-playbook -i grafana-inventory.yml install-grafana.yml
On any or all host (or using ansible) check that grafan process is running
sudo systemctl status grafana.service
ansible -i grafana-inventory.yml grafana -m shell -a "systemctl status grafana.service"
You can check system.d event with journalctl
or similar
journalctl -f -u grafana.service
Other commands:
systemctl cat grafana.service
Open the grafana UI on port 3000
using a web browser and authenticate using the configured user
Create a Prometheus DataSource, usually http://localhost:9090, connectivity is tests when saving the configuration
Import each Dashboard from JMX monitoring Stacks from ConfluentInc GitHub Repository. Under (jmxexporter-prometheus-grafana/assets/grafana/provisioning/dashboards) folder
This is an optional component for hardware and OS metrics exposed by *NIX kernels, since most environment already have a monitoring agent for node resources (cpu, memory, disk, etc). Note: the playbook provided by Cloud Alchemy has not been tested on an Air-Gap environment.
First, install the playbooks from Cloud Alchemy (recommended from https://github.com/prometheus/node_exporter )
ansible-galaxy install cloudalchemy.node_exporter
On an Air-Gap environment you clone the repository from GitHub on a host with internet access, tar the folder and ship it to the AirGapped Ansible controller
git clone https://github.com/cloudalchemy/ansible-node-exporter.git
tar -czvf cloudalchemy.node_exporter ansible-node-exporter
The above create a compressed file named cloudalchemy.node_exporter
as expected by the role name. On the Ansible Controller host, install the package with ansible-galaxy
.
ansible-galaxy role install cloudalchemy.node_exporter
This creates the role in ansible galaxy (usually at ~/.ansible/roles), you can check the installed role using:
ansible-galaxy role list
## File: node-exporter-install.yml
- hosts: all
pre_tasks:
- name: Create node_exporter cert dir
file:
path: "/etc/node_exporter"
state: directory
owner: node-exp
group: node-exp
- name: Copy certificate
copy:
src: ~/inventories/ssl/generated/server-demo.pem
dest: /etc/node_exporter/tls.pem
mode: "0640"
owner: "node-exp"
group: "node-exp"
- name: Copy certificate-Key
copy:
src: ~/inventories/ssl/generated/server-demo-key.pem
dest: /etc/node_exporter/tls.key
mode: "0640"
owner: "node-exp"
group: "node-exp"
roles:
- cloudalchemy.node_exporter
vars:
node_exporter_tls_server_config:
cert_file: /etc/node_exporter/tls.pem
key_file: /etc/node_exporter/tls.key
NOTE: Needs a "base" run to create the user first, on the second run it would then copy the files and re-configure
ansible-playbook -i ~/inventories/host-env1.yml node-exporter-install.yml
On any or all host (or using ansible) check that node-exporter process is running
sudo systemctl status node_exporter.service
ansible -i ~/inventories/sasl-rbac-env1.yml all -m shell -a "systemctl status node_exporter.service"
You can check system.d event with journalctl
or similar
journalctl -f -u node_exporter.service
Other commands:
systemctl cat node_exporter.service
Query node-exporter default /metrics
endpoint (port 9100
)
curl -k https://broker-1:9100/metrics
curl --cacert ~/inventories/ssl/generated/CAcert.pem https://broker-1:9100/metrics