Skip to content

Instantly share code, notes, and snippets.

@tomru
Last active January 10, 2020 23:11
Show Gist options
  • Save tomru/4989b51ceef9f1a2ca01aff220800a65 to your computer and use it in GitHub Desktop.
Save tomru/4989b51ceef9f1a2ca01aff220800a65 to your computer and use it in GitHub Desktop.
Helpers to write Luftdaten CSVs to your InfluxDB

⚠️ This has been moved to https://github.com/tomru/cram-luftdaten ⚠️

Write Luftdaten CSV to a InfluxDB

Helpers to write CSV files from https://www.madavi.de/sensor/csvfiles.php to your InfluxDb.

You can use this to fill up your local InfluxDB with previous data or fill data gaps in your influxdb.

How does it work

  • get the relevant CSVs from https://www.madavi.de/sensor/csvfiles.php
  • delete all lines that you already have in your DB (you must keep the csv header line)
  • export settings as environment vars in case they differ from the defaults (INFLUXDB_HOST, INFLUXDB_PORT, INFLUXDB_DATABASE, SENSOR_ID)
  • run cat file1.csv [file2.csv ...] | ./to_line_protocol.py | ./to_influx_db.sh

I should advise you to backup the DB first 😛

#!/usr/bin/env bash
#
# TODO: explain usage
set -e
HOST=${INFLUXDB_HOST:-localhost}
PORT=${INFLUXDB_PORT:-8086}
DATABASE=${INFLUXDB_DATABASE:-feinstaub}
PRECISION=s
SRC_FILE=${1:-/dev/stdin}
curl --include -X POST \
"http://${HOST}:${PORT}/write?db=${DATABASE}&precision=${PRECISION}" \
--data-binary @${SRC_FILE}
#!/usr/bin/env python3
""" Converts CSV exports from https://www.madavi.de/sensor/csvfiles.php to InfluxDB LineProtocol
Note: timestamps are in seconds, therefore precision "s" needs to be set
when writing, see https://docs.influxdata.com/influxdb/v1.2/tools/api/#write
Settings:
Please set
* SENSOR_ID
* INFLUXDB_DATABASE
in your environment
CSV file spec:
Semiconlos are used as delimiters. First line are the header columns.
Generally the column name is used as a DB field name, but there are some legacy
exceptions like "temperature" instead of "Temp" as listed in the following table.
| DB Field name | DB field | CSV Column | CSV Format
|--------------------|----------|--------------------|--------------------------
| none | time | Time | 2017/05/19 00:00:11 (UTC)
| ? | ? | durP1 |
| ? | ? | ratioP1 |
| ? | ? | P1 |
| ? | ? | durP2 |
| ? | ? | ratioP2 |
| ? | ? | P2 |
| SDS_P1 | field | SDS_P1 | 10.36
| SDS_P2 | field | SDS_P2 | 9.50
| PMS_P1 | field | PMS_P1 |
| PMS_P2 | field | PMS_P2 |
| temperature | field | Temp | 16.00
| humidity | field | Humidity | 79.00
| BMP_temperature | field | BMP_temperature |
| BMP_pressure | field | BMP_pressure |
| BME280_temperature | field | BME280_temperature |
| BME280_humidity | field | BME280_humidity |
| BME280_pressure | field | BME280_pressure |
| samples | field | Samples | 828799
| min_micro | field | Min_cycle | 172
| max_micro | field | Max_cycle | 25198
| signal | field | Signal | -91
| node | tag | N/A filename? | e.g. esp8266-16229960
"""
import os
import sys
import csv
from datetime import datetime, timedelta
def get_timestamp(timestr):
"""Converts CSV time value to a UTC timestamp in seconds"""
naive_dt = datetime.strptime(timestr, "%Y/%m/%d %H:%M:%S")
utc = (naive_dt - datetime(1970, 1, 1)) / timedelta(seconds=1)
return int(utc)
SENSOR_ID = os.environ.get("SENSOR_ID", "16229960")
DATABASE = os.environ.get("INFLUXDB_DATABASE", "sensors")
NODE = "esp8266-" + SENSOR_ID
NAME_MAP = {
"Humidity": "humidity",
"Max_cycle": "max_micro",
"Samples": "samples",
"Min_cycle": "min_micro",
"Signal": "signal",
"Temp": "temperature",
}
READER = csv.DictReader(sys.stdin, delimiter=";")
for row in READER:
# error out on legacy format until it's clear what that format is
if row["Time"] == "time":
raise Exception(
"Looks like a legacy format not supported yet. Send the file to the author please."
)
# catch multiple column headers
if row["Time"] == "Time":
continue
measurements = []
for header, value in row.items():
if header == "Time" or not value:
continue
measurements.append("{0}={1}".format(NAME_MAP.get(header, header), value))
values = {
"database": DATABASE,
"node": NODE,
"measurements": ",".join(measurements),
"time": get_timestamp(row["Time"]),
}
print("{database},node={node} {measurements} {time}".format(**values))
@tomru
Copy link
Author

tomru commented Jan 10, 2020

@hristo-stoyanov

  • ignoring multiple csv headers now when using the cat file1.csv file2.csv | approach
  • throwing an error when the old format is provided

Feel free to try again.

@tomru
Copy link
Author

tomru commented Jan 10, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment