Skip to content

Instantly share code, notes, and snippets.

@Paraphraser
Last active February 2, 2023 03:32
Show Gist options
  • Select an option

  • Save Paraphraser/f055815a6f26cb4e66e58138fa3fe4ae to your computer and use it in GitHub Desktop.

Select an option

Save Paraphraser/f055815a6f26cb4e66e58138fa3fe4ae to your computer and use it in GitHub Desktop.
On IOTstack, InfluxDB 1.8 & `INFLUXDB_DATA_INDEX_VERSION`

On IOTstack, InfluxDB 1.8 & INFLUXDB_DATA_INDEX_VERSION

Updated 2022-05-19

  • Additional observations since reverting to in-memory indexing.

the trigger question

On March 12 2022 I noticed a post on Discord by Ukkopahis saying:

I was thinking maybe INFLUXDB_DATA_INDEX_VERSION=tsi1 should be added as default? (if this is what you are talking about)

Until then I hadn't really given too much thought to the question "how much RAM is InfluxDB using?" The post sent me on a little voyage of discovery.

the environment

A few facts to set the scene:

  1. My IOTstack platform is a 4GB Raspberry Pi 4 Model B Rev 1.1 running Raspbian GNU/Linux 10 (buster) from a 480GB SSD, as a 32-bit OS with 64-bit kernel. And, yes, I should upgrade!

  2. df -H reports 3% utilisation of the SSD.

  3. sudo du -sh IOTstack/volumes reports 1.1GB.

  4. sudo du -sh IOTstack/volumes/influxdb reports 719MB.

  5. The largest database (in terms of rows) is a grid-power logger which gains a new row every 10 seconds. The oldest entry is in April 2018 and it currently has 12.8 million rows.

  6. The whole arrangement is standard no-frills MING:

    • sensors log via MQTT to Mosquitto
    • Node-RED subscribes to Mosquitto topics and formats for insertion into InfluxDB
    • Grafana displays charts based on what is stored in InfluxDB.

data acquisition

After using docker stats for a few days and paying attention to what happened to memory utilisation if I restarted the container, I set up this small shell script:

#!/usr/bin/env bash

date
docker stats --no-stream influxdb

and hooked it to a crontab entry firing it every hour:

0    */1  *    *    *    log_influx_ram >>./Logs/influx_ram.log 2>&1

data analysis

We're the better part of 2 months down the track so it's time for some analysis.

InfluxDB 1.8 Memory Utilisation

The chart is a bit busy so let me break it down:

  1. The X axis is time. The Y axis is the "MEM %" value reported for InfluxDB by docker stats.

  2. The shaded area marked "A" is the observed behaviour while the environment variable INFLUXDB_DATA_INDEX_VERSION was omitted. In other words, the default of in-memory indexing was in force. During that time, I would occasionally restart the container by hand. The typical pattern was memory utilisation slowly growing over time into the 8..10% range, falling back to the 1..2% range after a restart.

  3. On March 29 I added INFLUXDB_DATA_INDEX_VERSION=tsi1. That is in force for the two areas marked "B" and "C".

  4. Above the shaded areas are two time ranges:

    • Prior to April 24, I would occasionally restart the container by hand. You can see that memory climbs into the sub-20% area, falling back to the 1..2% range after any restart.

    • On April 24, I added the crontab entry:

       30   3    *    *    *    docker-compose -f ./IOTstack/docker-compose.yml restart influxdb >>./Logs/influx_ram.log 2>&1
      

      That does a better job of keeping memory utilisation below about 7%, at least for the remainder of the area marked B.

  5. On May 1st I tried to delete some extraneous data that had made its way into one of the databases, courtesy of insufficient care taken when debugging a sketch. Influx would not let me delete the series because I had a mixture of index types. A Discord post by Ukkopahis the next day included the hint:

    This won't migrate existing shards

    but at that time, I wasn't aware of this problem. There seem to be a few ways to migrate the shards. I just went with a script from IOTstackBackup:

    $ iotstack_reload_influxdb

    Thus, the area marked:

    • "A" is exclusively in-memory indexing;

    • "B" is a mixture of indexing types:

      • existing data continues to use in-memory indexing, while
      • newly-ingested data uses tsi1 indexing.
    • "C" is exclusively tsi1 indexing.

  6. For the area marked "C", the daily cron-job restarting the Influx container is still firing but memory utilisation is never below about 12% while climbing into the low 20% range during the course of each day.

  7. After publishing the first version of this gist on May 13, I reverted to in-memory indexing and reloaded the databases again. This is the area marked "D". The daily cron-job is still firing.

conclusions

What conclusions do I draw from this?

  1. On the face of it, INFLUXDB_DATA_INDEX_VERSION=tsi1 results in worse memory utilisation. It looks to me like I'd be better off removing that option and doing another iotstack_reload_influxdb.

  2. A daily restart via a cron-job certainly has the effect of keeping memory utilisation under control. I should keep that running.

  3. My perception is that the traces in "D" are higher, on average, than the right hand end of "B". That perception is confirmed by a two-independent-sample t-test (equal variances). I don't quite know what to make of that but it's still clear, to me, that in-memory indexing with a daily kick-in-the-pants from cron is as good a way as any for keeping InfluxDB 1.8 memory utilisation down.

Does my experience generalise or is it likely to be something to do with the size/structure of my databases? Honestly, I have no idea.

But, getting back to the trigger question of whether INFLUXDB_DATA_INDEX_VERSION=tsi1 should be the IOTstack default, absent some other explanation of the behaviour I've discussed above, I'd be putting my vote in the box marked "no".

@Paraphraser
Copy link
Copy Markdown
Author

And, to complete the picture, in case anyone wants to construct their own graphs, I use this script (named log_influx_ram):

#!/usr/bin/env bash

TIMESTAMP=$(date "+%d/%m/%Y %H:%M:%S")
STATISTIC=$(docker stats --no-stream influxdb)
MEMPC=$(echo $STATISTIC | cut -d " " -f 23)
echo "\"$TIMESTAMP\",$MEMPC"

Note:

  • getting RAM figures out of docker stats depends on adding the following to your /boot/cmdline.txt (reboot needed):

     cgroup_memory=1 cgroup_enable=memory
    

The script writes to stdout. You can test it by just running it:

$ log_influx_ram
"02/02/2023 14:25:54",2.01%

I drive the script with this crontab entry that fires every hour on the hour:

# log influxdb ram usage every hour
0    */1  *    *    *    log_influx_ram >>./Logs/influx_ram.csv 2>&1

The script is in my PATH but you could also use an absolute path.

Let it run for a few days then open the CSV in something like Excel and away you go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment