Skip to content

Instantly share code, notes, and snippets.

@DennisFederico
Last active September 13, 2023 03:49
Show Gist options
  • Save DennisFederico/a49a5ff0fdebcecfd000b6da92e07416 to your computer and use it in GitHub Desktop.
Save DennisFederico/a49a5ff0fdebcecfd000b6da92e07416 to your computer and use it in GitHub Desktop.

Install Prometheus-Grafana pack

https://www.confluent.io/blog/monitor-kafka-clusters-with-prometheus-grafana-and-confluent/

Note on Ansible Playbooks and Binaries

This document is prepared to set up the monitoring stack using Ansible as much as possible, Prometheus and Grafana have been tested to work on an Air-Gap (no internet access) environment providing the binaries from the official site, using the playbooks from GitHub user (0x0I)[https://github.com/O1ahmad]. Prometheus Node exporter playbook, provided by (Clud Alchemy)[https://github.com/cloudalchemy] has not bee tested on an Air-Gap environment, it doesn't mean it couldn't work.


Setup JMX Export on the platform services (Using CP-Ansble)

Prometheus scrape metrics from an http endpoint that need to be exposed on the target hosts. This endpoint is exposed by a jmxexporter javaagent that can be enabled using CP-Ansible, adding the following configuration to the inventory host file.

    #### Monitoring Configuration ####
    jmxexporter_enabled: true
    jmxexporter_url_remote: false
    jmxexporter_jar_url: ~/jmx/jmx_prometheus_javaagent-0.17.2.jar

The jmx-exporter jar file can be downloaded from Maven central

wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.17.2/jmx_prometheus_javaagent-0.17.2.jar

The configuration for the exporter is already provided by cp-ansible and also available in the jmx-monitoring-stacks, in shared-assets/jmx-exporter

Check the exporter is running

Each service can be configured with a different port for the exporter endpoint, these are default values per component:

  • Zookeepers 8079
  • Brokers 8080
  • SchemaRegistry 8078
  • Connect 8077
  • KSQL 8076

After running the CP-Ansible playbook you can test the endpoint using:

curl http://<component-host>:<component-port>

Prepare Scrapping targets for the environment

Use jmx-monitoring-stacks playbook (under jmxexporter-prometheus-grafana/cp-ansible) to create the scrape targets information for Prometheus.

From the cp-ansible directory mentioned above... ansible-playbook -i inventory.yml prometheus-config.yml -e env=environment-name

ansible-playbook -i ~/inventories/sasl-rbac-env1.yml prometheus-config.yml -e env=primary -e node_exporter_enabled=true

ansible-playbook -i ~/inventories/sasl-rbac-env2.yml prometheus-config.yml -e env=dr -e node_exporter_enabled=true

The generated file in the example above are the scrape_config for prometheus that must copied in to prometheus playbook configuration in the next step


Install Prometheus and Grafana Playbook dependency

The next playbooks for installing Prometheus and Grafana, have a dependency over a playbook to prepare systemd processes. This can be installed as:

ansible-galaxy role install 0x0i.systemd

Or for an Air Gap ansible host, use a two step process .

  1. Download and package the playbook on a host with access to GitHub
git clone https://github.com/0x0I/ansible-role-systemd
tar -czvf 0x0i.systemd ansible-role-systemd
  1. Copy the file 0x0i.systemd to the ansible controller host and install the role
ansible-galaxy role install 0x0i.systemd

Install Prometheus

The binaries for Prometheus are available here: https://prometheus.io/download/#prometheus The default execution of the playbook downloads the binary from the above url, for AirGap environments, it is assumed that this is downloaded to a ~/downloads folder (change the inventory file below)

Provision Prometheus Playbook

The playbooks for this step are provided by https://github.com/0x0I/ansible-role-prometheus

For the latest version of the playbook, download and package from GitHub

git clone https://github.com/0x0I/ansible-role-prometheus.git
tar -czvf 0x0i.prometheus ansible-role-prometheus

Install using ansible-galaxy

ansible-galaxy role install 0x0I.prometheus

Prepare an installation playbook

The is a wrapper playbook to launch the install role

## File: install-prometheus.yml
---
- name: Installing Prometheus on hosted machine
  hosts: prometheus
  gather_facts: true

  tasks:
    - name: Create temp dir for binary
      file:
        path: "/tmp/prometheus"
        state: directory
        mode: "0666"

    - name: Copy prometheus binary
      copy:
        src: "{{ prometheus_local_binary }}"
        dest: "{{ prometheus_remote_binary }}"
        mode: "0666"

    - import_role:
        name: 0x0i.prometheus
      vars:
        archive_url: "file://{{ prometheus_remote_binary }}"
        archive_checksum: ''  

Prepare an Inventory file with the service configuration

This inventory file will contain the scrapping targets configuration prepared in "Prepare Scrapping targets for the environment" section

# prometheus-inventory.yml
---
all:
  vars:
    ansible_connection: ssh
    ansible_user: dfederico
    ansible_become: true
    ansible_ssh_private_key_file: ~/.ssh/id_rsa
    ansible_python_interpreter: /usr/bin/python3
    ansible_ssh_common_args: '-o StrictHostKeyChecking=no'

    prometheus_archive_name: prometheus-2.37.6.linux-amd64
    prometheus_local_binary: "~/downloads/{{ prometheus_archive_name }}.tar.gz"
    prometheus_remote_binary: "/tmp/prometheus/{{ prometheus_archive_name }}.tar.gz"

    prometheus_config:
      scrape_configs:
        - job_name: "zookeeper"
          static_configs:
            - targets:
                - "dfederico-demo-zk-0:8079"
              labels:
                env: "primary"
            - targets:
                - "dfederico-demo-dr-zk-0:8079"
              labels:
                env: "dr"
          relabel_configs:
            - source_labels: [__address__]
              target_label: instance
              regex: '([^:]+)(:[0-9]+)?'
              replacement: '${1}'

        - job_name: "kafka-broker"
          static_configs:
            - targets:
                - "dfederico-demo-broker-0:8080"
                - "dfederico-demo-broker-1:8080"
                - "dfederico-demo-broker-2:8080"
              labels:
                env: "primary"
            - targets:
                - "dfederico-demo-dr-broker-0:8080"
                - "dfederico-demo-dr-broker-1:8080"
                - "dfederico-demo-dr-broker-2:8080"
              labels:
                env: "dr"
          relabel_configs:
            - source_labels: [__address__]
              target_label: instance
              regex: '([^:]+)(:[0-9]+)?'
              replacement: '${1}'

        - job_name: "schema-registry"
          static_configs:
            - targets:
                - "dfederico-demo-sr-0:8078"
              labels:
                env: "primary"
            - targets:
                - "dfederico-demo-dr-sr-0:8078"
              labels:
                env: "dr"
          relabel_configs:
            - source_labels: [__address__]
              target_label: instance
              regex: '([^:]+)(:[0-9]+)?'
              replacement: '${1}'

        - job_name: "kafka-connect"
          static_configs:
            - targets:
                - "dfederico-demo-connect-0:8077"
              labels:
                env: "primary"
            - targets:
                - "dfederico-demo-dr-connect-0:8077"
              labels:
                env: "dr"
          relabel_configs:
            - source_labels: [__address__]
              target_label: instance
              regex: '([^:]+)(:[0-9]+)?'
              replacement: '${1}'

prometheus:
  hosts:
    dfederico-demo-extra-0:

The above examples set up two environment, the job names match the services as they will be used in grafana, and the different targets are labelled with an env tag each.

Run the playbook

ansible-playbook -i prometheus-inventory.yml install-prometheus.yml

Check

On any or all host (or using ansible) check that node-exporter process is running

sudo systemctl status prometheus

ansible -i prometheus-inventory.yml prometheus -m shell -a "systemctl status prometheus.service"

You can check system.d event with journalctl or similar

journalctl -f -u prometheus.service

Other commands:

systemctl cat prometheus.service

Default config file: /etc/prometheus/prometheus.yml

Query endpoint default / or /status endpoint (port 9090)

Open a webbrowser to port 9090 to /targets endpoint to confirm all the scraping works


Install Grafana

The binaries for Grafana are available here: https://grafana.com/grafana/download/9.5.3?edition=oss The default execution of the playbook downloads the binary from the above url, for AirGap environments, it is assumed that this is downloaded to a ~/downloads folder (change the inventory file below)

Provision Grafana Playbook

The playbooks for this step are provided by https://github.com/0x0I/ansible-role-grafana

For the latest version of the playbook, download and package from GitHub

git clone https://github.com/0x0I/ansible-role-grafana
tar -czvf 0x0i.grafana ansible-role-grafana

Install using ansible-galaxy

ansible-galaxy role install 0x0i.grafana

Prepare an installation playbook

The is a wrapper playbook to launch the install role

## File: install-grafana.yml
---
- name: Installing Grafana on hosted machine
  hosts: grafana
  gather_facts: true

  tasks:
    - name: Create temp dir for binary
      file:
        path: "/tmp/grafana"
        state: directory
        mode: "0666"

    - name: Copy grafana binary
      copy:
        src: "{{ grafana_local_binary }}"
        dest: "{{ grafana_remote_binary }}"
        mode: "0666"

    - import_role:
        name: 0x0i.grafana
      vars:
        archive_url: "file://{{ grafana_remote_binary }}"
        archive_checksum: ''  

Run the playbook against the host inventory file

ansible-playbook -i ~/inventories/gcp-sandbox/sasl-rbac-env1.yml grafana-install.yml

Prepare an Inventory file with the service configuration

(Optional) This can be merged with the prometheus-inventory.yml

# grafana-inventory.yml
---
all:
  vars:
    ansible_connection: ssh
    ansible_user: dfederico
    ansible_become: true
    ansible_ssh_private_key_file: ~/.ssh/id_rsa
    ansible_python_interpreter: /usr/bin/python3
    ansible_ssh_common_args: '-o StrictHostKeyChecking=no'

    grafana_archive_name: grafana-9.4.3.linux-amd64
    grafana_local_binary: "~/downloads/{{ grafana_archive_name }}.tar.gz"
    grafana_remote_binary: "/tmp/grafana/{{ grafana_archive_name }}.tar.gz"

    grafana_config:
    # section [security]
      security:
        admin_user: admin
        admin_password: admin-secret

grafana:
  hosts:
    dfederico-demo-extra-0:

Note/change the admin user and password above that should be used to access the WebGUI

Run the playbook

ansible-playbook -i grafana-inventory.yml install-grafana.yml

Check

On any or all host (or using ansible) check that grafan process is running

sudo systemctl status grafana.service

ansible -i grafana-inventory.yml grafana -m shell -a "systemctl status grafana.service"

You can check system.d event with journalctl or similar

journalctl -f -u grafana.service

Other commands:

systemctl cat grafana.service

Open Grafana and setup the Prometheus datasource

Open the grafana UI on port 3000 using a web browser and authenticate using the configured user

400

Create a Prometheus DataSource, usually http://localhost:9090, connectivity is tests when saving the configuration

200 200

Import Confluent Platform Dashboards

Import each Dashboard from JMX monitoring Stacks from ConfluentInc GitHub Repository. Under (jmxexporter-prometheus-grafana/assets/grafana/provisioning/dashboards) folder

200

(Optional) Install Prometheus Node Exporter

This is an optional component for hardware and OS metrics exposed by *NIX kernels, since most environment already have a monitoring agent for node resources (cpu, memory, disk, etc). Note: the playbook provided by Cloud Alchemy has not been tested on an Air-Gap environment.

Provision Node-Exporter Playbook

First, install the playbooks from Cloud Alchemy (recommended from https://github.com/prometheus/node_exporter )

ansible-galaxy install cloudalchemy.node_exporter

On an Air-Gap environment you clone the repository from GitHub on a host with internet access, tar the folder and ship it to the AirGapped Ansible controller

git clone https://github.com/cloudalchemy/ansible-node-exporter.git 
tar -czvf cloudalchemy.node_exporter ansible-node-exporter

The above create a compressed file named cloudalchemy.node_exporter as expected by the role name. On the Ansible Controller host, install the package with ansible-galaxy.

ansible-galaxy role install cloudalchemy.node_exporter

This creates the role in ansible galaxy (usually at ~/.ansible/roles), you can check the installed role using:

ansible-galaxy role list

Prepare an installation playbook

## File: node-exporter-install.yml
- hosts: all
  pre_tasks:
    - name: Create node_exporter cert dir
      file:
        path: "/etc/node_exporter"
        state: directory
        owner: node-exp
        group: node-exp

    - name: Copy certificate
      copy:
        src: ~/inventories/ssl/generated/server-demo.pem
        dest: /etc/node_exporter/tls.pem
        mode: "0640"
        owner: "node-exp"
        group: "node-exp"

    - name: Copy certificate-Key
      copy:
        src: ~/inventories/ssl/generated/server-demo-key.pem
        dest: /etc/node_exporter/tls.key
        mode: "0640"
        owner: "node-exp"
        group: "node-exp"

  roles:
    - cloudalchemy.node_exporter
  vars:
    node_exporter_tls_server_config:
      cert_file: /etc/node_exporter/tls.pem
      key_file: /etc/node_exporter/tls.key

Run the playbook against the host inventory file

NOTE: Needs a "base" run to create the user first, on the second run it would then copy the files and re-configure

ansible-playbook -i ~/inventories/host-env1.yml node-exporter-install.yml

Check

On any or all host (or using ansible) check that node-exporter process is running

sudo systemctl status node_exporter.service

ansible -i ~/inventories/sasl-rbac-env1.yml all -m shell -a "systemctl status node_exporter.service"

You can check system.d event with journalctl or similar

journalctl -f -u node_exporter.service

Other commands:

systemctl cat node_exporter.service

Query node-exporter default /metrics endpoint (port 9100)

curl -k https://broker-1:9100/metrics

curl --cacert ~/inventories/ssl/generated/CAcert.pem https://broker-1:9100/metrics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment