Currently our SQL plugins derive things from the special names of the columns. Instead we can use more conventional Long and Wide formats and derive the format from the schema. This will result in a more conventional SQL experience for users. The amount of code plugin authors need to create will be reduced.
Taking a MySQL Example:
SELECT
$__timeGroup(createdAt,'$summarize') as time_sec,
avg(value) as value,
hostname as metric
FROM
grafana_metric
WHERE
$__timeFilter(createdAt) AND
measurement = 'logins.count' AND
hostname IN($host)
GROUP BY 1, 3
ORDER BY 1
Which becomes:
SELECT
UNIX_TIMESTAMP(createdAt) DIV 60 * 60 as time_sec,
avg(value) as value,
hostname as metric
FROM
grafana_metric
WHERE
createdAt BETWEEN FROM_UNIXTIME(1580908377) AND FROM_UNIXTIME(1580985308) AND
measurement = 'logins.count' AND
hostname IN('10.1.100.1','10.1.100.10','server1','server2')
GROUP BY 1, 3
ORDER BY 1
Produces Long Format Data:
"Time";"Metric";"Value"
"2020-02-06 10:35:00";"server2";"445.36"
"2020-02-06 10:35:00";"server1";"445.85"
"2020-02-06 10:35:00";"10.1.100.10";"445.64"
"2020-02-06 10:35:00";"10.1.100.1";"445.59"
"2020-02-06 10:34:00";"server2";"446.28"
"2020-02-06 10:34:00";"server1";"446.25"
"2020-02-06 10:34:00";"10.1.100.10";"446.56"
"2020-02-06 10:34:00";"10.1.100.1";"446.72"
In order to minimally impact existing queries, we can update timeGroup to select time column. Users will still have to update when not using macros. Will also need to make sure we can always select into time objects from our all of our sql datasources to have a proper schema.
So the SQL query becomes the following (having sql create time):
SELECT
FROM_UNIXTIME(UNIX_TIMESTAMP(createdAt) DIV 60 * 60) as time_sec,
avg(value) as value,
hostname as metric
FROM
grafana_metric
WHERE
createdAt BETWEEN FROM_UNIXTIME(1580908377) AND FROM_UNIXTIME(1580985308) AND
measurement = 'logins.count' AND
hostname IN('10.1.100.1','10.1.100.10','server1','server2')
GROUP BY 1, 3
ORDER BY 1
- Both:
- Have Time column
- Have one or more Numeric Field
- Long
- Will have String Fields (in Result)
- Time will be repeated (Not detectable in schema, may be sparse)
- Have no Labels on the Dataframe's Fields.
- Wide
- Will Not have string Fields
- Time (should) not be repeated
Reference from azure data explorer datasource: https://github.com/grafana/azure-data-explorer-datasource#time-series-queries
- Has one Time column.
- Number type columns are "Metrics". The name of the column is the name of the metric.
- String type columns are Key/Value pair identifiers (a.k.a labels, tags). The column name is the identifer key, the value in each row is identifier Value.
- Each unique time series in the response is the unique combination of metric and Key/Value pair identifiers.
- Has one Time column.
- Has one or more Number value columns. The Unique name of the column is the Metric.
- If the column has a Labels property, they become Key/Value pair identifiers. And the Name does not have to be unique if Name+Labels is unique.
- Alerting and GEL will need same interpretation ability.
- There are some other types to support. Both those involve nested arrays most likely and dataframe does have those yet.
- Wide vs Long is determined by format of query result, not format of data in database.
- There may be cases where there is a second time column, and maybe that should be treated as a graph-able value.
- We should not force these schemas to always be time series. Assumption is the context is one of time series.
- A reference on wide vs long: https://docs.timescale.com/latest/introduction/data-model