Telegraf¶
This document discusses an IOTstack-specific version of Telegraf built on top of influxdata/influxdata-docker/telegraf using a Dockerfile.
The purpose of the Dockerfile is to:
- tailor the default configuration to be IOTstack-ready; and
- enable the container to perform self-repair if essential elements of the persistent storage area disappear.
References¶
- influxdata Telegraf home
- GitHub: influxdata/influxdata-docker/telegraf
- DockerHub: influxdata Telegraf
Significant directories and files¶
~/IOTstack
├── .templates
│ └── telegraf
│ ├── Dockerfile ❶
│ ├── entrypoint.sh ❷
│ ├── iotstack_defaults
│ │ ├── additions ❸
│ │ └── auto_include ❹
│ └── service.yml ❺
├── services
│ └── telegraf
│ └── service.yml ❻
├── docker-compose.yml
└── volumes
└── telegraf ❼
├── additions ❽
├── telegraf-reference.conf ➒
└── telegraf.conf ➓
- The Dockerfile used to customise Telegraf for IOTstack.
- A replacement for the
telegraf
container script of the same name, extended to handle container self-repair. - The additions folder. See Applying optional additions.
- The auto_include folder. Additions automatically applied to
telegraf.conf
. See Automatic includes to telegraf.conf. - The template service definition.
- The working service definition (only relevant to old-menu, copied from ❹).
- The persistent storage area for the
telegraf
container. - A working copy of the additions folder (copied from ❸). See Applying optional additions.
- The reference configuration file. See Changing Telegraf's configuration.
- The active configuration file. A subset of ➒ altered to support communication with InfluxDB running in a container in the same IOTstack instance.
Everything in the persistent storage area ❼:
- will be replaced if it is not present when the container starts; but
- will never be overwritten if altered by you.
How Telegraf gets built for IOTstack¶
IOTstack menu¶
When you select Telegraf in the IOTstack menu, the template service definition is copied into the Compose file.
Under old menu, it is also copied to the working service definition and then not really used.
IOTstack first run¶
On a first install of IOTstack, you run the menu, choose your containers, and are told to do this:
$ cd ~/IOTstack
$ docker-compose up -d
See also the Migration considerations (below).
docker-compose
reads the Compose file. When it arrives at the telegraf
fragment, it finds:
telegraf:
container_name: telegraf
build: ./.templates/telegraf/.
…
The build
statement tells docker-compose
to look for:
~/IOTstack/.templates/telegraf/Dockerfile
The Dockerfile is in the
.templates
directory because it is intended to be a common build for all IOTstack users. This is different to the arrangement for Node-RED where the Dockerfile is in theservices
directory because it is how each individual IOTstack user's version of Node-RED is customised.
The Dockerfile begins with:
FROM telegraf:latest
If you need to pin to a particular version of Telegraf, the Dockerfile is the place to do it. See Telegraf version pinning.
The FROM
statement tells the build process to pull down the base image from DockerHub.
It is a base image in the sense that it never actually runs as a container on your Raspberry Pi.
The remaining instructions in the Dockerfile customise the base image to produce a local image. The customisations are:
- Add the
rsync
package. This helps the container perform self-repair. - Copy the default configuration file that comes with the DockerHub image (so it will be available as a fully-commented reference for the user) and make it read-only.
- Make a working version of the default configuration file from which comment lines and blank lines have been removed.
- Patch the working version to support communications with InfluxDB running in another container in the same IOTstack instance.
-
Replace
entrypoint.sh
with a version which:- calls
rsync
to perform self-repair iftelegraf.conf
goes missing; and - enforces root:root ownership in
~/IOTstack/volumes/telegraf
.
- calls
The local image is instantiated to become your running container.
When you run the docker images
command after Telegraf has been built, you may see two rows for Telegraf:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
iotstack_telegraf latest 59861b7fe9ed 2 hours ago 292MB
telegraf latest a721ac170fad 3 days ago 273MB
telegraf
is the base image; andiotstack_telegraf
is the local image.
You may see the same pattern in Portainer, which reports the base image as "unused". You should not remove the base image, even though it appears to be unused.
Whether you see one or two rows depends on the version of
docker-compose
you are using and how your version ofdocker-compose
builds local images.
Migration considerations¶
Under the original IOTstack implementation of Telegraf (just "as it comes" from DockerHub), the service definition expected telegraf.conf
to be at:
~/IOTstack/services/telegraf/telegraf.conf
Under this implementation of Telegraf, the configuration file has moved to:
~/IOTstack/volumes/telegraf/telegraf.conf
The change of location is one of the things that allows self-repair to work properly.
With one exception, all prior and current versions of the default configuration file are identical in terms of their semantics.
In other words, once you strip away comments and blank lines, and remove any "active" configuration options that simply repeat their default setting, you get the same subset of "active" configuration options. The default configuration file supplied with gcgarner/IOTstack is available here if you wish to refer to it.
The exception is [[inputs.mqtt_consumer]]
which is now provided as an optional addition. If your existing Telegraf configuration depends on that input, you will need to apply it. See applying optional additions.
Logging¶
You can inspect Telegraf's log by:
$ docker logs telegraf
These logs are ephemeral and will disappear when your Telegraf container is rebuilt.
log message: database "telegraf" creation failed¶
The following log message can be misleading:
W! [outputs.influxdb] When writing to [http://influxdb:8086]: database "telegraf" creation failed: Post "http://influxdb:8086/query": dial tcp 172.30.0.9:8086: connect: connection refused
If InfluxDB is not running when Telegraf starts, the depends_on:
clause in Telegraf's service definition tells Docker to start InfluxDB (and Mosquitto) before starting Telegraf. Although it can launch the InfluxDB container first, Docker has no way of knowing when the influxd
process running inside the InfluxDB container will start listening to port 8086.
What this error message usually means is that Telegraf has tried to communicate with InfluxDB before the latter is ready to accept connections. Telegraf typically retries after a short delay and is then able to communicate with InfluxDB.
Changing Telegraf's configuration¶
The first time you launch the Telegraf container, the following structure will be created in the persistent storage area:
~/IOTstack/volumes/telegraf
├── [drwxr-xr-x root ] additions
│ └── [-rw-r--r-- root ] inputs.mqtt_consumer.conf
├── [-rw-r--r-- root ] telegraf.conf
└── [-r--r--r-- root ] telegraf-reference.conf
The file:
-
telegraf-reference.conf
:- is a reference copy of the default configuration file that ships with the base image for Telegraf when it is downloaded from DockerHub. It is nearly 9000 lines long and is mostly comments.
- is not used by Telegraf but will be replaced if you delete it.
- is marked "read-only" (even for root) as a reminder that it is only for your reference. Any changes you make will be ignored.
-
telegraf.conf
:- is created by removing all comment lines and blank lines from
telegraf-reference.conf
, leaving only the "active" configuration options, and then adding options necessary for IOTstack. - is less than 30 lines and is significantly easier to understand than
telegraf-reference.conf
.
- is created by removing all comment lines and blank lines from
-
inputs.mqtt_consumer.conf
– see Applying optional additions below.
The intention of this structure is that you:
- search
telegraf-reference.conf
to find the configuration option you need; - read the comments to understand what the option does and how to use it; and then
- import the option into the correct section of
telegraf.conf
.
When you make a change to telegraf.conf
, you activate it by restarting the container:
$ cd ~/IOTstack
$ docker-compose restart telegraf
Automatic includes to telegraf.conf¶
-
inputs.docker.conf
instructs Telegraf to collect metrics from Docker. Requires kernel control groups to be enabled to collect memory usage data. If not done during initial installation, enable by running (reboot required):$ CMDLINE="/boot/firmware/cmdline.txt" && [ -e "$CMDLINE" ] || CMDLINE="/boot/cmdline.txt" $ echo $(cat "$CMDLINE") cgroup_memory=1 cgroup_enable=memory | sudo tee "$CMDLINE"
-
inputs.cpu_temp.conf
collects cpu temperature.
Applying optional additions¶
The additions folder (see Significant directories and files) is a mechanism for additional IOTstack-ready configuration options to be provided for Telegraf.
Currently there is one addition:
inputs.mqtt_consumer.conf
which formed part of the gcgarner/IOTstack telegraf configuration and instructs Telegraf to subscribe to a metric feed from the Mosquitto broker. This assumes, of course, that something is publishing those metrics.
Using inputs.mqtt_consumer.conf
as the example, applying that addition to
your Telegraf configuration file involves:
$ cd ~/IOTstack/volumes/telegraf
$ grep -v "^#" additions/inputs.mqtt_consumer.conf | sudo tee -a telegraf.conf >/dev/null
$ cd ~/IOTstack
$ docker-compose restart telegraf
The grep
strips comment lines and the sudo tee
is a safe way of appending the result to telegraf.conf
. The restart
causes Telegraf to notice the change.
Getting a clean slate¶
Erasing the persistent storage area¶
Erasing Telegraf's persistent storage area triggers self-healing and restores known defaults:
$ cd ~/IOTstack
$ docker-compose down telegraf
$ sudo rm -rf ./volumes/telegraf
$ docker-compose up -d telegraf
Notes:
-
You can also remove individual files within the persistent storage area and then trigger self-healing. For example, if you decide to edit
telegraf-reference.conf
and make a mess, you can restore the original version like this:$ cd ~/IOTstack $ sudo rm ./volumes/telegraf/telegraf-reference.conf $ docker-compose restart telegraf
-
See also if downing a container doesn't work
Resetting the InfluxDB database¶
To reset the InfluxDB database that Telegraf writes into, proceed like this:
$ cd ~/IOTstack
$ docker-compose down telegraf
$ docker exec -it influxdb influx -precision=rfc3339
> drop database telegraf
> exit
$ docker-compose up -d telegraf
In words:
- Be in the right directory.
- Stop the Telegraf container (while leaving the InfluxDB container running). See also if downing a container doesn't work.
- Launch the Influx CLI inside the InfluxDB container.
- Delete the
telegraf
database, and then exit the CLI. - Start the Telegraf container. This re-creates the database automatically.
Upgrading Telegraf¶
You can update most containers like this:
$ cd ~/IOTstack
$ docker-compose pull
$ docker-compose up -d
$ docker system prune
In words:
docker-compose pull
downloads any newer images;docker-compose up -d
causes any newly-downloaded images to be instantiated as containers (replacing the old containers); and- the
prune
gets rid of the outdated images.
This strategy doesn't work when a Dockerfile is used to build a local image on top of a base image downloaded from DockerHub. The local image is what is running so there is no way for the pull
to sense when a newer version becomes available.
The only way to know when an update to Telegraf is available is to check the Telegraf tags page on DockerHub.
Once a new version appears on DockerHub, you can upgrade Telegraf like this:
$ cd ~/IOTstack
$ docker-compose build --no-cache --pull telegraf
$ docker-compose up -d telegraf
$ docker system prune
$ docker system prune
Breaking it down into parts:
build
causes the named container to be rebuilt;--no-cache
tells the Dockerfile process that it must not take any shortcuts. It really must rebuild the local image;--pull
tells the Dockerfile process to actually check with DockerHub to see if there is a later version of the base image and, if so, to download it before starting the build;telegraf
is the named container argument required by thebuild
command.
Your existing Telegraf container continues to run while the rebuild proceeds. Once the freshly-built local image is ready, the up
tells docker-compose
to do a new-for-old swap. There is barely any downtime for your service.
The prune
is the simplest way of cleaning up. The first call removes the old local image. The second call cleans up the old base image. Whether an old base image exists depends on the version of docker-compose
you are using and how your version of docker-compose
builds local images.
Telegraf version pinning¶
If you need to pin Telegraf to a particular version:
-
Use your favourite text editor to open the following file:
~/IOTstack/.templates/telegraf/Dockerfile
-
Find the line:
FROM telegraf:latest
-
Replace
latest
with the version you wish to pin to. For example, to pin to version 1.19.3:FROM telegraf:1.19.3
-
Save the file and tell
docker-compose
to rebuild the local image:$ cd ~/IOTstack $ docker-compose up -d --build telegraf $ docker system prune
The new local image is built, then the new container is instantiated based on that image. The
prune
deletes the old local image.
Note:
- As well as preventing Docker from updating the base image, pinning will also block incoming updates to the Dockerfile from a
git pull
. Nothing will change until you decide to remove the pin.