Skip to content

Instantly share code, notes, and snippets.

@hartfordfive
Last active November 5, 2020 22:32
Show Gist options
  • Save hartfordfive/5863e99f76220d15e5a030454f95c5b3 to your computer and use it in GitHub Desktop.
Save hartfordfive/5863e99f76220d15e5a030454f95c5b3 to your computer and use it in GitHub Desktop.
How to backfill prometheus metrics into a Postgres TimescaleDB instance

Description

Method of populating data into TimescaleDB where Prometheus doesn't directly scrape the metrics.
This can be used in cases where a user needs to backfill metrics which are only generated every few hours or at the end of every day.

Method #1: Run the inserts directly on the Postgres database


Create the cpu_total metric table, or in other words, create the actual "cpu_total" series In Prometheus terms, a "series" is a metric which is identified by a "name" and an optional set of key-value pairs which are considered "labels" See: https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

1. Create the metrics table

To create the "table" for a given metric, this is the format of the command which must be executed

SELECT _prom_catalog.get_or_create_metric_table_name('[METRIC_NAME]');

For example, to create a table for a metric called 'cpu_total', you would execute the following

SELECT _prom_catalog.get_or_create_metric_table_name('cpu_total');

2. Finalize the creation of the metrics table

CALL _prom_catalog.finalize_metric_creation(); -- finalize the creation of those metrics (this function is to manage concurrency, essentially)

3. Insert the data points

Here's what the format of the insert statement should look like:

INSERT INTO prom_data.[METRIC_NAME]
  SELECT timestamptz '2020-11-04 19:36:01', [DATA_POINT_VALUE], _prom_catalog.get_or_create_series_id('{"_name_": "[METRIC_NAME]", "label1":"value1", "label2": "value2"}');

An insert for a real metric might look like this:

INSERT INTO prom_data.cpu_total
  SELECT timestamptz '2020-11-04 19:36:01', 90.0, _prom_catalog.get_or_create_series_id('{"_name_": "cpu_total", "namespace":"production", "node": "pinky", "new_tag_2":"bar"}');

or like this if inserting multiple data points for improved efficiency:

INSERT INTO prom_data.cpu_total (time, name, value, labels) VALUES
  (SELECT timestamptz '2020-11-04 19:36:01', 91.0, _prom_catalog.get_or_create_series_id('{"_name_": "cpu_total", "namespace":"dev", "node": "pinky", "new_tag_2":"foo"}')),
  (SELECT timestamptz '2020-11-04 19:37:01', 92.0, _prom_catalog.get_or_create_series_id('{"_name_": "cpu_total", "namespace":"production", "node": "jersy", "new_tag_2":"bar"}')),
  (SELECT timestamptz '2020-11-04 19:38:01', 93.0, _prom_catalog.get_or_create_series_id('{"_name_": "cpu_total", "namespace":"production", "node": "web", "new_tag_2":"baz"}')),
  (SELECT timestamptz '2020-11-04 19:39:01', 94.0, _prom_catalog.get_or_create_series_id('{"_name_": "cpu_total", "namespace":"production", "node": "app", "new_tag_2":"boo"}'));

Here's a final example from start to finish of what your script for insert statements might resemble:

\set ECHO all
\set ON_ERROR_STOP 1

SELECT _prom_catalog.get_or_create_metric_table_name('cpu_total');

CALL _prom_catalog.finalize_metric_creation(); -- finalize the creation of those metrics (this function is to manage concurrency, essentially)

INSERT INTO prom_data.cpu_total (time, name, value, labels) VALUES
  (SELECT timestamptz '2020-11-04 19:36:01', 91.0, _prom_catalog.get_or_create_series_id('{"_name_": "cpu_total", "namespace":"dev", "node": "pinky", "new_tag_2":"foo"}')),
  (SELECT timestamptz '2020-11-04 19:37:01', 92.0, _prom_catalog.get_or_create_series_id('{"_name_": "cpu_total", "namespace":"production", "node": "jersy", "new_tag_2":"bar"}')),
  (SELECT timestamptz '2020-11-04 19:38:01', 93.0, _prom_catalog.get_or_create_series_id('{"_name_": "cpu_total", "namespace":"production", "node": "web", "new_tag_2":"baz"}')),
  (SELECT timestamptz '2020-11-04 19:39:01', 94.0, _prom_catalog.get_or_create_series_id('{"_name_": "cpu_total", "namespace":"production", "node": "app", "new_tag_2":"boo"}'));

sample insert commands have been borrowed from sample test data from the promscale github project

Method #2: Post the metrics directly to the promscale connector in the necessary protobuf format


This HTTP endpoint receives metrics which are encoded with a specific protobuf definition and compressed with snappy encoding. The protobuf definitions are availabe here:

https://github.com/prometheus/prometheus/blob/master/prompb/remote.proto https://github.com/prometheus/prometheus/blob/master/prompb/types.proto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment