Skip to content

Instantly share code, notes, and snippets.

@oetiker
Created July 4, 2025 14:12
Show Gist options
  • Save oetiker/9db07252136c91974f4e0e9fc92e2e07 to your computer and use it in GitHub Desktop.
Save oetiker/9db07252136c91974f4e0e9fc92e2e07 to your computer and use it in GitHub Desktop.
Concise CheckMK 2.3+ Agent Plugin Writers Guide

CheckMK Plugin Development Guide

This guide contains the essential information needed to develop CheckMK plugins without referring to extensive documentation. Use this as a reference for future plugin development.

Plugin Architecture Overview

CheckMK plugins consist of two main components that run in different locations:

Agent Plugins (Host-side)

  • Location: /usr/lib/check_mk_agent/plugins/ on monitored hosts
  • Purpose: Collect data from the monitored system
  • Execution: Run when CheckMK agent is queried
  • Output: Structured data in CheckMK section format
  • Requirements: Must be executable, typically run as root

Check Plugins (Server-side)

  • Location: ~/local/lib/python3/cmk_addons/plugins/{plugin_family}/ on CheckMK server
  • Purpose: Process agent data, discover services, perform checks
  • Components: Parse function, discovery function, check function
  • API: Uses CheckMK's agent_based API v2

Directory Structure

{plugin_name}/                          # Plugin family name
├── agent_based/
│   └── {plugin_name}.py               # Check plugin (main logic)
├── server_side_calls/
│   └── {plugin_name}.py               # Bakery plugin (optional, for auto-distribution)
├── rulesets/
│   └── {plugin_name}.py               # Web GUI configuration (thresholds, etc.)
└── graphing/
    └── {plugin_name}.py               # Metrics and graph definitions

agents/
└── plugins/
    └── {plugin_name}                   # Agent plugin (executable)

Agent Plugin Template

#!/usr/bin/env python3
"""
CheckMK Agent Plugin for {Purpose}
Collects {type of data} from {source}
This runs on the monitored host
"""

import json
import subprocess
import sys
from typing import Dict, List, Optional

def collect_data() -> List[Dict]:
    """Collect data from the system"""
    try:
        # Your data collection logic here
        result = subprocess.run(
            ['your_command', '--options'],
            capture_output=True,
            text=True,
            timeout=30
        )
        if result.returncode != 0:
            return []
        
        # Parse and return structured data
        return parse_output(result.stdout)
    except (subprocess.TimeoutExpired, FileNotFoundError):
        return []

def main():
    """Main function to collect and output data"""
    data = collect_data()
    
    if not data:
        # Don't output section if no valid data
        return
    
    print("<<<{section_name}:sep(124)>>>")
    
    for item in data:
        # Filter out items that can't be monitored
        if not item.get('can_be_monitored'):
            continue
            
        # Output: item_id|json_data
        output_data = {
            'item_id': item['id'],
            'friendly_name': item.get('name', item['id']),
            'metrics': item.get('metrics', {}),
            # Include any other relevant data
        }
        
        print(f"{item['id']}|{json.dumps(output_data, separators=(',', ':'))}")

if __name__ == "__main__":
    main()

Check Plugin Template

#!/usr/bin/env python3
"""
CheckMK Check Plugin for {Purpose}
Processes {type of data} and generates metrics
This runs on the CheckMK server
"""

import json
from typing import Any, Dict, Mapping

from cmk.agent_based.v2 import (
    AgentSection,
    CheckPlugin,
    CheckResult,
    DiscoveryResult,
    Metric,
    Result,
    Service,
    State,
    StringTable,
    render,  # For formatting values (bytes, time, etc.)
)

Section = Dict[str, Dict[str, Any]]

def _create_friendly_description(item_data: Dict[str, Any]) -> str:
    """Create user-friendly service description"""
    name = item_data.get('friendly_name', item_data.get('item_id', 'Unknown'))
    # Add any additional identifying information
    return name

def parse_{plugin_name}(string_table: StringTable) -> Section:
    """Parse agent output into structured data"""
    section = {}
    
    for line in string_table:
        if len(line) < 2:
            continue
            
        item_id = line[0]
        
        if line[1] == "ERROR":
            error_msg = line[2] if len(line) > 2 else "Unknown error"
            section[item_id] = {"error": error_msg}
            continue
        
        try:
            data = json.loads(line[1])
            section[item_id] = data
        except json.JSONDecodeError:
            section[item_id] = {"error": "Invalid JSON data"}
    
    return section

def discover_{plugin_name}(section: Section) -> DiscoveryResult:
    """Discover services for items that can be monitored"""
    for item_id, data in section.items():
        if "error" not in data:
            yield Service(item=item_id)

def check_{plugin_name}(item: str, params: Mapping[str, Any], section: Section) -> CheckResult:
    """Check function with metrics and thresholds"""
    if item not in section:
        yield Result(state=State.UNKNOWN, summary=f"Item {item} not found")
        return
    
    data = section[item]
    
    if "error" in data:
        yield Result(state=State.CRIT, summary=f"Error: {data['error']}")
        return
    
    # Create friendly description
    item_desc = _create_friendly_description(data)
    
    # Extract metrics
    metrics = data.get("metrics", {})
    if not metrics:
        yield Result(state=State.UNKNOWN, summary=f"{item_desc}: No metrics available")
        return
    
    # Generate metrics
    for metric_name, value in metrics.items():
        yield Metric(metric_name, value)
    
    # Apply thresholds and determine state
    state = State.OK
    summary_parts = []
    
    # Example threshold checking
    if "critical_metric" in metrics:
        critical_value = metrics["critical_metric"]
        if "critical_threshold" in params:
            warn, crit = params["critical_threshold"]
            if critical_value >= crit:
                state = State.CRIT
            elif critical_value >= warn:
                state = State.WARN
        
        summary_parts.append(f"Critical metric: {critical_value}")
    
    if not summary_parts:
        summary_parts.append("All metrics OK")
    
    summary = f"{item_desc}: {', '.join(summary_parts)}"
    yield Result(state=state, summary=summary)

# Register components
agent_section_{plugin_name} = AgentSection(
    name="{plugin_name}",
    parse_function=parse_{plugin_name},
)

check_plugin_{plugin_name} = CheckPlugin(
    name="{plugin_name}",
    service_name="{Service Name} %s",
    discovery_function=discover_{plugin_name},
    check_function=check_{plugin_name},
    check_ruleset_name="{plugin_name}",  # Links to rulesets
    check_default_parameters={},
)

Rulesets Template (Web GUI Configuration)

#!/usr/bin/env python3
"""
CheckMK Web GUI configuration for {Plugin Name}
Defines rule sets and parameter forms
"""

from cmk.gui.i18n import _
from cmk.gui.plugins.wato.utils import (
    CheckParameterRulespecWithoutItem,
    rulespec_registry,
)
from cmk.gui.valuespec import (
    Dictionary,
    Float,
    Integer,
    Tuple,
)

def _parameter_valuespec_{plugin_name}():
    """Value specification for thresholds"""
    return Dictionary(
        title=_("{Plugin Display Name} Thresholds"),
        help=_("Configure thresholds for {plugin description}."),
        elements=[
            ("critical_threshold", 
             Tuple(
                 title=_("Critical Metric Threshold"),
                 help=_("Warning and critical levels for the critical metric"),
                 elements=[
                     Float(title=_("Warning at"), default_value=80.0),
                     Float(title=_("Critical at"), default_value=90.0),
                 ],
             )
            ),
            # Add more threshold configurations as needed
        ]
    )

rulespec_registry.register(
    CheckParameterRulespecWithoutItem(
        check_group_name="{plugin_name}",
        group=_("Applications"),  # or appropriate group
        match_type="dict",
        parameter_valuespec=_parameter_valuespec_{plugin_name},
        title=lambda: _("{Plugin Display Name} Monitoring"),
    )
)

Graphing Template

#!/usr/bin/env python3
"""
CheckMK Metrics and Graph Definitions for {Plugin Name}
"""

from cmk.gui.i18n import _
from cmk.gui.plugins.metrics import (
    graph_info,
    metric_info,
    perfometer_info,
)

# Metric definitions
metric_info["your_metric_name"] = {
    "title": _("Your Metric Display Name"),
    "unit": "count",  # or "bytes", "percent", "1/s", etc.
    "color": "11/a",  # Color scheme
}

# Graph definitions
graph_info["your_graph_name"] = {
    "title": _("Your Graph Title"),
    "metrics": [
        ("your_metric_name", "area"),  # or "line"
    ],
    "optional_metrics": [
        "your_metric_name",
    ],
    "consolidation_function": "max",
    "range": (0, None),
}

# Perfometer for service overview
perfometer_info.append({
    "type": "linear",
    "segments": ["your_metric_name"],
    "total": 100.0,
})

Bakery Plugin Template (CheckMK 2.3+)

#!/usr/bin/env python3
"""
CheckMK Bakery Plugin for {Plugin Name}
Automatically distributes agent plugin to monitored hosts
Compatible with CheckMK 2.3 and 2.4
"""

from pathlib import Path
from typing import Any, Dict, Iterator

from cmk.utils.rulesets.definition import RuleGroup
from cmk.gui.i18n import _
from cmk.gui.valuespec import Dictionary, DropdownChoice, Integer
from cmk.gui.plugins.wato.utils import HostRulespec, rulespec_registry

try:
    # CheckMK 2.4+ API
    from cmk.server_side_calls.v1 import SpecialAgentConfig
    NEW_API = True
except ImportError:
    # CheckMK 2.3 API
    NEW_API = False

def _valuespec_agent_config_{plugin_name}():
    """Configuration interface for agent plugin"""
    return Dictionary(
        title=_("{Plugin Display Name} (Linux)"),
        help=_("This plugin monitors {description}."),
        elements=[
            ("enabled", 
             DropdownChoice(
                 title=_("Enable {plugin name} monitoring"),
                 choices=[
                     (True, _("Enable")),
                     (False, _("Disable")),
                 ],
                 default_value=True,
             )
            ),
            ("timeout",
             Integer(
                 title=_("Command timeout (seconds)"),
                 default_value=30,
                 minvalue=10,
                 maxvalue=300,
             )
            ),
        ],
        optional_keys=["timeout"],
    )

# Register the rule
rulespec_registry.register(
    HostRulespec(
        group=RuleGroup.AgentConfig,
        name="agent_config:{plugin_name}",
        valuespec=_valuespec_agent_config_{plugin_name},
        title=lambda: _("{Plugin Display Name} (Linux)"),
        is_deprecated=False,
    )
)

def _get_agent_plugin_content(params: Dict[str, Any]) -> str:
    """Generate agent plugin content with configuration"""
    plugin_source = Path(__file__).parent.parent.parent / "agents" / "plugins" / "{plugin_name}"
    
    try:
        with open(plugin_source, 'r') as f:
            content = f.read()
    except FileNotFoundError:
        content = _get_embedded_agent_plugin()
    
    # Modify content based on params if needed
    timeout = params.get("timeout", 30)
    if timeout != 30:
        content = content.replace("timeout=30", f"timeout={timeout}")
    
    return content

# Version-specific implementations
if NEW_API:
    def generate_{plugin_name}_config(params: Dict[str, Any]) -> Iterator[SpecialAgentConfig]:
        if not params.get("enabled", True):
            return
        
        plugin_content = _get_agent_plugin_content(params)
        yield SpecialAgentConfig(
            command_arguments=[],
            stdin=None,
            files_to_include=[
                ("plugins/{plugin_name}", plugin_content, 0o755),
            ],
        )
else:
    def agent_config_{plugin_name}(params: Dict[str, Any], hostname: str, ipaddress: str) -> Dict[str, Any]:
        if not params.get("enabled", True):
            return {}
        
        plugin_content = _get_agent_plugin_content(params)
        return {
            "files": {
                "plugins/{plugin_name}": {
                    "content": plugin_content,
                    "permissions": 0o755,
                }
            }
        }

Key Best Practices

Agent Plugin Guidelines

  1. Filter data: Only output items that can be meaningfully monitored
  2. Handle errors gracefully: Don't crash on missing commands or permissions
  3. Use timeouts: Prevent hanging on slow commands
  4. JSON output: Use compact JSON format for structured data
  5. Friendly names: Include model, serial, or other identifying information

Check Plugin Guidelines

  1. Friendly descriptions: Create human-readable service names and summaries
  2. Error handling: Handle missing data and malformed input gracefully
  3. Meaningful metrics: Generate metrics that are useful for monitoring and graphing
  4. Threshold support: Allow configuration via web GUI parameters
  5. State logic: Use appropriate WARNING/CRITICAL states based on thresholds

Development Process

  1. Start with agent plugin: Get data collection working first
  2. Test agent output: Verify JSON format and content
  3. Build check plugin: Parse data, create services, generate metrics
  4. Add web GUI: Create threshold configuration interface
  5. Add graphs: Define metrics and visualizations
  6. Add bakery plugin: Enable automatic distribution (optional)
  7. Document filtering: Explain what gets monitored and why

Installation

  • Server: cp -r {plugin_family} ~/local/lib/python3/cmk_addons/plugins/
  • Hosts: sudo cp agents/plugins/{plugin_name} /usr/lib/check_mk_agent/plugins/
  • Restart: cmk -R

This guide provides the essential structure and patterns for developing CheckMK plugins efficiently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment