Skip to content

Instantly share code, notes, and snippets.

@tomru
Last active January 10, 2020 23:11
Show Gist options
  • Save tomru/4989b51ceef9f1a2ca01aff220800a65 to your computer and use it in GitHub Desktop.
Save tomru/4989b51ceef9f1a2ca01aff220800a65 to your computer and use it in GitHub Desktop.
Helpers to write Luftdaten CSVs to your InfluxDB

⚠️ This has been moved to https://github.com/tomru/cram-luftdaten ⚠️

Write Luftdaten CSV to a InfluxDB

Helpers to write CSV files from https://www.madavi.de/sensor/csvfiles.php to your InfluxDb.

You can use this to fill up your local InfluxDB with previous data or fill data gaps in your influxdb.

How does it work

  • get the relevant CSVs from https://www.madavi.de/sensor/csvfiles.php
  • delete all lines that you already have in your DB (you must keep the csv header line)
  • export settings as environment vars in case they differ from the defaults (INFLUXDB_HOST, INFLUXDB_PORT, INFLUXDB_DATABASE, SENSOR_ID)
  • run cat file1.csv [file2.csv ...] | ./to_line_protocol.py | ./to_influx_db.sh

I should advise you to backup the DB first 😛

#!/usr/bin/env bash
#
# TODO: explain usage
set -e
HOST=${INFLUXDB_HOST:-localhost}
PORT=${INFLUXDB_PORT:-8086}
DATABASE=${INFLUXDB_DATABASE:-feinstaub}
PRECISION=s
SRC_FILE=${1:-/dev/stdin}
curl --include -X POST \
"http://${HOST}:${PORT}/write?db=${DATABASE}&precision=${PRECISION}" \
--data-binary @${SRC_FILE}
#!/usr/bin/env python3
""" Converts CSV exports from https://www.madavi.de/sensor/csvfiles.php to InfluxDB LineProtocol
Note: timestamps are in seconds, therefore precision "s" needs to be set
when writing, see https://docs.influxdata.com/influxdb/v1.2/tools/api/#write
Settings:
Please set
* SENSOR_ID
* INFLUXDB_DATABASE
in your environment
CSV file spec:
Semiconlos are used as delimiters. First line are the header columns.
Generally the column name is used as a DB field name, but there are some legacy
exceptions like "temperature" instead of "Temp" as listed in the following table.
| DB Field name | DB field | CSV Column | CSV Format
|--------------------|----------|--------------------|--------------------------
| none | time | Time | 2017/05/19 00:00:11 (UTC)
| ? | ? | durP1 |
| ? | ? | ratioP1 |
| ? | ? | P1 |
| ? | ? | durP2 |
| ? | ? | ratioP2 |
| ? | ? | P2 |
| SDS_P1 | field | SDS_P1 | 10.36
| SDS_P2 | field | SDS_P2 | 9.50
| PMS_P1 | field | PMS_P1 |
| PMS_P2 | field | PMS_P2 |
| temperature | field | Temp | 16.00
| humidity | field | Humidity | 79.00
| BMP_temperature | field | BMP_temperature |
| BMP_pressure | field | BMP_pressure |
| BME280_temperature | field | BME280_temperature |
| BME280_humidity | field | BME280_humidity |
| BME280_pressure | field | BME280_pressure |
| samples | field | Samples | 828799
| min_micro | field | Min_cycle | 172
| max_micro | field | Max_cycle | 25198
| signal | field | Signal | -91
| node | tag | N/A filename? | e.g. esp8266-16229960
"""
import os
import sys
import csv
from datetime import datetime, timedelta
def get_timestamp(timestr):
"""Converts CSV time value to a UTC timestamp in seconds"""
naive_dt = datetime.strptime(timestr, "%Y/%m/%d %H:%M:%S")
utc = (naive_dt - datetime(1970, 1, 1)) / timedelta(seconds=1)
return int(utc)
SENSOR_ID = os.environ.get("SENSOR_ID", "16229960")
DATABASE = os.environ.get("INFLUXDB_DATABASE", "sensors")
NODE = "esp8266-" + SENSOR_ID
NAME_MAP = {
"Humidity": "humidity",
"Max_cycle": "max_micro",
"Samples": "samples",
"Min_cycle": "min_micro",
"Signal": "signal",
"Temp": "temperature",
}
READER = csv.DictReader(sys.stdin, delimiter=";")
for row in READER:
# error out on legacy format until it's clear what that format is
if row["Time"] == "time":
raise Exception(
"Looks like a legacy format not supported yet. Send the file to the author please."
)
# catch multiple column headers
if row["Time"] == "Time":
continue
measurements = []
for header, value in row.items():
if header == "Time" or not value:
continue
measurements.append("{0}={1}".format(NAME_MAP.get(header, header), value))
values = {
"database": DATABASE,
"node": NODE,
"measurements": ",".join(measurements),
"time": get_timestamp(row["Time"]),
}
print("{database},node={node} {measurements} {time}".format(**values))
@hristo-stoyanov
Copy link

Can u tell me syntax for multiple files ? Have a limit of files ?

@tomru
Copy link
Author

tomru commented Jan 10, 2020

Just realise the README is not up to date as you can currently not specify mulitple csv files as each file has headers and I guess that might confuse the CSV parser. So do cat file.csv | ./to_line_protocol.py | ./to_influx_db.sh for each file you have for now

Will have a look if its simple to fix the multiple file case soonish...

@hristo-stoyanov
Copy link

Just realise the README is not up to date as you can currently not specify mulitple csv files as each file has headers and I guess that might confuse the CSV parser. So do cat file.csv | ./to_line_protocol.py | ./to_influx_db.sh for each file you have for now

Will have a look if its simple to fix the multiple file case soonish...

Thank you! I have success with new archive files but now i get parse error for files from 2018y ( data-esp8266-2326109-2018-06-14.csv - for example ).I have old archives from 2018-05 to now and try to import.

@tomru
Copy link
Author

tomru commented Jan 10, 2020

@hristo-stoyanov just tried data-esp8266-2326109-2018-06-14.csv from madavi.de and it worked for me. Did you recently download these files?

As I stated in the comment here I seemed to have had the same problem with some old files (I just forgot). Not sure if all the files on madavi.de have been updated at some point to have the new format.

Best if you compare the column names from old files with the version on madavi.de first.

@tomru
Copy link
Author

tomru commented Jan 10, 2020

@hristo-stoyanov

  • ignoring multiple csv headers now when using the cat file1.csv file2.csv | approach
  • throwing an error when the old format is provided

Feel free to try again.

@tomru
Copy link
Author

tomru commented Jan 10, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment