kylebrandt/README.md

Last active March 14, 2025 17:02

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/kylebrandt/baf00606e7a2dd4158721ba7092f12b6.js"></script>
Save kylebrandt/baf00606e7a2dd4158721ba7092f12b6 to your computer and use it in GitHub Desktop.

Download ZIP

SQL Expressions and Full Long (Written by ChatGPT)

Raw

README.md

Why `*_full_long` Formats Are Essential for Handling Label Sparsity

Overview

In Grafana, data coming from many sources includes sparse labels — meaning some series have label keys that others do not.
This is common in systems like Prometheus, OpenTelemetry, or SQL queries with GROUP BY.

However, traditional long formats (like time_series_long or numeric_long) do not preserve label sparsity correctly, leading to broken behaviors in visualization, alerts, and expressions.

The Problem: Flattening Label Keys Forces Sparsity Loss

When converting from Wide or Multi to a Long format:

All observed label keys are turned into string fields
Rows where a label is not present get filled with "", "unknown", or another default

This means:

You can no longer distinguish between "label" not present and "label" = '' (empty string)

User-Facing Consequences

🔹 1. Visualization: Unexpected Series Explosion

Before:

cpu{host="a"}  
cpu{host="a",int="eth0"}

After flattening:

cpu{host="a",int=""}  
cpu{host="a",int="eth0"}

Impact:

The panel renders 2 series instead of 1 (unexpectedly)
Legend shows int="" which user never wrote
Confusing UI and data interpretation

🔹 2. Alerting: Duplicate or Missed Alerts

Flattened rows cause:

Phantom series
Duplicate alert instances (one per padded label set)
Broken alert silencing due to label mismatch

Example:

Alert for cpu{host="a"} fires twice:
- cpu{host="a",int=""}
- cpu{host="a",int="eth0"}

🔹 3. Transformations: Broken Grouping and Aggregation

Transformations like Group By or Reduce rely on exact label identity.

Flattened labels:

Cause unrelated series to group together
Or cause real series to fragment into multiple groups

Example:

GROUP BY host,int

Includes series with "int" = "" that shouldn’t be there.

🔹 4. Legends and Labels: UI Clutter

Before: Only real label sets are shown.

After flattening:

Legend shows int="", int="eth0", etc.
Panel becomes noisy
User loses confidence in data or query logic

🔹 5. SQL Expressions: Phantom Rows and Wrong Groups

SQL expects well-formed rows. When flattening inserts defaults like "":

SQL results include phantom rows
GROUP BY logic fails silently
Alert expressions or formulas break

✅ Solution: Use `time_series_full_long` / `numeric_full_long`

These formats:

Keep label keys as columns (like Long)
Use null when a label was not set
Include __metric__ field to preserve metric identity
Support round-trip back to Wide/Multi

Example:

| __metric__ | host | int   | value |
|------------|------|-------|--------|
| cpu        | a    | null  | 1.0
| cpu        | a    | eth0  | 2.0

🟢 Benefits to Users

Without `*_full_long`	With `*_full_long`
❌ Mystery series in legend	✅ Only true series appear
❌ Duplicate alerts	✅ One alert per real series
❌ Broken reduce / group transforms	✅ Accurate results
❌ Confusing SQL results	✅ Clean, expected rows
❌ Phantom data in UI	✅ Sparse labels preserved as-is

Summary

Label sparsity is a fundamental property of dimensional time series data.
If not handled correctly, it silently breaks behavior in:

Visualizations
Alerts
SQL expressions
Transformations
Legends and query interfaces

The *_full_long formats solve this problem with:

Explicit metric identity
Null-aware label handling
Row-normalized, SQL-friendly structure

This ensures correctness, clarity, and round-trip safety throughout the Grafana pipeline.

Raw

example_spec.md

Full Long Formats: `time_series_full_long` and `numeric_full_long`

Status: Draft
Version: 0.1

Overview

This document defines two new data formats in the Data Plane Contract:

time_series_full_long (Kind: Time Series)
numeric_full_long (Kind: Numeric)

These formats extend the "Long" family of data representations with a fully normalized row-based structure that supports accurate label sparsity, explicit metric identity, and safe round-trip conversion from Wide and Multi formats.

Time Series Full Long Format (`time_series_full_long`)

Kind: Time Series
Format: FullLong
Version: 0.1

This is a fully normalized long format for time series data. Each row represents a single data point: one metric, at one timestamp, with a complete set of dimension values.

Properties

Must have exactly one Frame
Must have exactly one Time field (first field of type Time)
Must include a __metric__ field (Type: String) with the metric name
Must include one Value field (numeric or boolean)
- Field name defaults to value, or is specified by Frame.Meta.Custom.valueField
May include any number of String fields representing label keys (dimensions)
Label fields must be present, and null means the label was not set for that row
Metric name and labels together form the identity of the series

Remainder Data

Any additional time fields after the first
Any additional Frames with a different or missing type
Additional metadata fields are allowed (e.g., __type__, __frame__)

Example

| Time       | __metric__ | host | int   | value |
|------------|------------|------|-------|--------|
| 2022-01-01 | cpu        | a    | null  | 1.0    |
| 2022-01-01 | cpu        | a    | eth0  | 2.0    |

Numeric Full Long Format (`numeric_full_long`)

Kind: Numeric
Format: FullLong
Version: 0.1

This format is used for representing individual numeric values with dimensions (labels). Each row represents one metric with its associated dimension values.

Properties

Must have exactly one Frame
Must include a __metric__ field (Type: String)
Must include one Value field (numeric or boolean)
- Field name defaults to value, or is specified by Frame.Meta.Custom.valueField
May include any number of String fields representing dimensions
Label fields must always be present across all rows
- null values indicate the label was not set

Remainder Data

Any time fields (time is not used in Numeric kind)
Any additional frames with a different or missing type
Optional metadata fields allowed (e.g., __type__, metricTypeMap)

Example

| __metric__ | host | int   | value |
|------------|------|-------|--------|
| cpu        | a    | null  | 1.0    |
| cpu        | a    | eth0  | 2.0    |
| mem        | a    | null  | 256.0  |

Motivation and Benefits

These formats exist to:

Enable SQL expressions and row-based processing
Support accurate alerting and visualization post-SQL
Preserve metric identity and label sparsity
Support clean round-trip conversion to/from Wide and Multi formats
Avoid ambiguity in series grouping and transformation

They are designed to work safely with:

SQL queries and expressions
Alert rule evaluation and grouping
Table transformations
Graph, Stat, Table, and Alert visualization panels

Future Considerations

Consider supporting a __type__ column or metadata to track value types in mixed-metric rows
Tooling should allow easy conversion between Long and FullLong variants
Consumers should default to interpreting null as “label not set” and exclude it from grouping unless explicitly configured

kylebrandt/README.md

Why *_full_long Formats Are Essential for Handling Label Sparsity

Overview

The Problem: Flattening Label Keys Forces Sparsity Loss

User-Facing Consequences

🔹 1. Visualization: Unexpected Series Explosion

🔹 2. Alerting: Duplicate or Missed Alerts

🔹 3. Transformations: Broken Grouping and Aggregation

🔹 4. Legends and Labels: UI Clutter

🔹 5. SQL Expressions: Phantom Rows and Wrong Groups

✅ Solution: Use time_series_full_long / numeric_full_long

🟢 Benefits to Users

Summary

Full Long Formats: time_series_full_long and numeric_full_long

Overview

Time Series Full Long Format (time_series_full_long)

Properties

Remainder Data

Example

Numeric Full Long Format (numeric_full_long)

Properties

Remainder Data

Example

Motivation and Benefits

Future Considerations

Why `*_full_long` Formats Are Essential for Handling Label Sparsity

✅ Solution: Use `time_series_full_long` / `numeric_full_long`

Full Long Formats: `time_series_full_long` and `numeric_full_long`

Time Series Full Long Format (`time_series_full_long`)

Numeric Full Long Format (`numeric_full_long`)