A couple of months ago I had my first encounter with InfluxDB. I found it very interesting from the very start because its key concepts are different from the ones used in the SQL or MongoDB databases, in which I had some experience. The main obstacle for any programmer is, in my opinion, a lack of helpful resources available online. This is the very problem with InfluxDB. So, here I will attempt to make it somewhat easier to deploy and understand how the deployment of InfluxDB is done on Kubernetes.
This blog will be divided into three parts:
- Introduction to InfluxDB
- Deployment and resources
- Wrap up with a few words
So, let’s get started!
About InfluxDB
To be able to properly define InfluxDB let’s first define the data it usually stores. Time series data is a sequence of data points, typically consisting of successive measurements, over some time interval. So, InfluxDB is an open-source database optimized for the time series data stage. With this said, it is easy to assume that every piece of data inside InfluxDB has an exact time when it is measured, or at least a time when it is written into the database. InfluxDB is made to work with a high load of point writes and point reads. This makes it a very good choice if we want to set up some kind of monitoring where time precision is of great importance.
Key concepts and data elements
There are certain elements that all data inside of InfluxDB consists of. Below is a simple description of all of them.
Timestamp – the time at which our measure is written into InfluxDB
Fieldset – a set of key-value pairs (field_name and field_value). At least one field is necessary so the data we wish to save into InfluxDB is valid. A valid type for a field value is a string, float, integer, and boolean.
Tag set – a set of key-value pairs (tag_name and tag_value). Unlike fields, tags are indexed. That means querying with tags is faster than querying with fields. So, tags contain commonly queried data. Tag values can only be strings.
Measurements – are the place where we store the elements above. The measurement name should describe the data which is stored in it.
Series – a collection of points that share a measurement, tag set, and field key.
Buckets – buckets are containers for all the elements above. Each bucket has a retention policy that serves as lifecycle management. Basically, it defines the lifespan of the data inside the bucket.
Organization – consists of buckets and their users.
The next diagram shows how some of these elements are related.
Simple GO-InfluxDB application
When learning new things, I always find it easier if I have something I can run. So, I prepared a simple application that would have helped me when I was starting with InfluxDB. All the code can be found here huseincausevic-abh along with instructions on how to run it.
The application goal was to deploy a simple API written in Go programming language along with InfluxDB and to ensure that communication with InfluxDB is established.
Before we begin, to follow these examples and to be able to deploy the application on Kubernetes you should have a Kubernetes cluster and Kubectl command-line tool to communicate with the cluster. Now, with that said, we can deploy our application.
Deploying InfluxDB to Kubernetes
To successfully deploy InfluxDB we have to write a couple of resources. Since, InfluxDB is a database, at any time it has some kind of state which must be persisted. For this purpose, we are using the StatefulSet Kubernetes resource. It grants unique network identifiers and stable persistent storage for all the pods defined in the manifest file, which is useful if we want to scale our application later on.
Our high-level goal is to:
- Have one InfluxDB instance running;
- Ensure that traffic is possible as soon as the InfluxDB pod is up;
- Perform periodical health checks to see if everything is running as desired/expected.
The manifest file that describes the bullets above is:
apiVersion: apps/v1 kind: StatefulSet metadata: labels: app: influxdb-demo name: influxdb-demo spec: replicas: 1 selector: matchLabels: app: influxdb-demo serviceName: influxdb-demo template: metadata: labels: app: influxdb-demo spec: containers: - image: quay.io/influxdb/influxdb:2.0.0-beta imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /health port: api scheme: HTTP initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: influxdb-demo ports: - containerPort: 9999 name: api protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /health port: api scheme: HTTP initialDelaySeconds: 5 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1
Here we said that we want to build our InfluxDB container from the quay.io/influxdb/influxdb:2.0.0-beta image that is located on Red Hat’s images repository.
The readiness and Liveness probe ensures the second and third bullets respectively. Both probes are functioning in a similar way. The liveness probe is performed periodically after the pod is marked as running, causing the pod to restart if something is not as expected. The readiness probe, on the other hand, will not restart the pod but it will remove the endpoint from the InfluxDB Kubernetes service which points to that specific pod.
InfluxDB has a health check defined on the path /health that will tell us whether it is running correctly or not, so our probe success statuses are based on its return value. If this probe fails we would get something like this after describing the influxdb-demo pod:
$ kubectl describe pod -n husein influxdb-demo-0 Name: influxdb-demo-0 Namespace: husein Priority: 0 . . . Warning Unhealthy 58s kubelet, .ec2.internal Readiness probe failed: Get http://10.0.10.57:9999/health: dial tcp 10.0.10.57:9999: connect: connection
Now if we grab the stateful set manifest file and apply it to the cluster, we should be able to see that it is running:
husein:~/r8/r8/go-Influxdb-simple-app/k8s$ kubectl apply -f influxdb-statefulset.yaml -n husein statefulset.apps/influxdb-demo created husein:~/r8/r8/go-Influxdb-simple-app/k8s$ kubectl get pods -n husein | grep influxdb-demo influxdb-demo-0 1/1 Running 0 26s
The next thing we need to define is the InfluxDB service, so we can communicate with the created pod.
--- apiVersion: v1 kind: Service metadata: labels: app: influxdb-demo name: influxdb-demo spec: type: NodePort ports: - name: api port: 9999 targetPort: 9999 nodePort: 31234 selector: app: influxdb-demo
This service is going to target port 9999 of all the pods that have the label app: influxdb-demo defined, which our pod in the previously created resource has. In the example above we also used the NodePort service type. This is usually only used for dev purposes, but for this example, it is good to help demonstrate what we achieved so far.
Let’s apply it to the cluster so we can see.
husein:~/r8/r8/go-Influxdb-simple-app/k8s$ kubectl apply -f influxdb-service.yaml -n husein service/influxdb-demo created husein:~/r8/r8/go-Influxdb-simple-app/k8s$ kubectl get svc -n husein | grep influxdb-demo influxdb-demo NodePort 172.20.223.47 9999:31234/TCP 14s
After we defined the service, we can get the ExternalDNS so we can access our InfluxDB through the web browser.
# get nodes name $ kubectl get nodes | awk ‘{ print $1 }’ # copy one and do $ kubectl describe node | grep ExternalDNS
After that, you should be able to see the Chronograf dashboard for InfluxDB data. Chronograf is the user interface and administrative component of InfluxDB. We can access it on URL: ExternalDNS:3124. You will be able to see this:
GO application resources
The service resource for this application is almost identical to the one we wrote for InfluxDB. So, I won’t go over it here. As I said, this Go application is a simple API that only processes data it gets, and then sends the processed request to InfluxDB. This means this is a stateless application. For this purpose, we will use the Deployment resource instead of the StatefulSet that we used for InfluxDB.
Here is the resource definition:
apiVersion: apps/v1 kind: Deployment metadata: name: go-app labels: app: go-app spec: replicas: 1 selector: matchLabels: app: go-app template: metadata: labels: app: go-app spec: containers: - image: hcausevic5/go-influxdb-simple-app name: go-app imagePullPolicy: Always ports: - containerPort: 4444 volumeMounts: - name: influx-creds mountPath: /app/influxdb readOnly: true volumes: - name: influx-creds secret: secretName: influxdb-auth-demo
It should be pretty clear what we want to achieve here. We will have one pod (replicas: 1) and that pod will have one container built from go-influxdb-simple-app image on my DockerHub account. We will put it under the app-service with the label: app: go-app and it should receive our requests on port 4444. There are also volume mounts that I find interesting. Why would we need that?
Further InfluxDB setup
Basically, before we can do anything with our InfluxDB instance we need to do a setup. The setup will define the initial user, bucket and organization. Along with a bunch of other stuff, in return, we will get an authentication token which we need to do writes, queries, etc. In our Go code, we need that token so we can communicate with the InfluxDB instance. Of course, it is possible to get into the InfluxDB container or do a manual setup with Chronograf’s dashboard UI but then we will also need to manually change it in Go code. For that purpose, we will define another resource: Secret. It looks something like this:
--- apiVersion: v1 kind: Secret type: Opaque metadata: name: influxdb-auth-demo data: url: aHR0cDovL2luZmx1eGRiLWRlbW86OTk5OQ== username: aHVzZWlu password: aHVzZWluMTIz org: bXktb3Jn bucket: bXktYnVja2V0
This secret contains all the necessary data we need to successfully set up InfluxDB. But what are those hieroglyphs, you might ask. Well, Kubernetes’ secrets require the data value field to be base64 encoded. Encoding and decoding can be performed like this:
$ echo -n http://influxdb-demo:9999 | base64 -w 0 aHR0cDovL2luZmx1eGRiLWRlbW86OTk5OQ== $ echo -n aHR0cDovL2luZmx1eGRiLWRlbW86OTk5OQ== | base64 —decode http://influxdb-demo:9999
We introduced this resource just for the token, and it hasn’t even shown up? Well, not yet. As I already said the token is just a piece of data that we get after setting up InfluxDB, which we need for communication through the Go client library for InfluxDB. We don’t know what this value will be, and we don’t need to know if we automate the process of setting up InfluxDB and saving the token.
For automation purposes, we will be building another resource: Job. If we look up InfluxDB API docs, we can see that API Endpoint (/api/v2/setup) can be used for this. The job of the Job (:D) is to set up our InfluxDB instance using the values from the previously created secret and then patch the secret with the token value it gets in return.
This secret will be mounted into the /app/influxdb directory of the pod. The next code shows us how we can extract that data in our GO code:
func mountedConnectionParameters() map[string]string { connectionParams := make(map[string]string) basePath := "/app/influxdb" files, err := ioutil.ReadDir(basePath) if err != nil { panic(err) } for _, file := range files { if strings.HasPrefix(file.Name(), ".") == false { fileContent, err := ioutil.ReadFile(fmt.Sprintf("%v/%v", basePath, file.Name())) if err != nil { logrus.Errorf("Could not read file %v", file.Name()) } connectionParams[file.Name()] = string(fileContent) } } return connectionParams }
The function above will return all the connection parameters we used to set up our InfluxDB, and with that data, we can successfully send our queries to InfluxDB using the influxdb-client-go package.
Also, since we already made some effort to automate the InfluxDB setup, we can automate the whole setup. This will be made with bash script: deploy.sh. This script can be run with two parameters: (-n) namespace, (-m, apply delete recreate) mode.
$ bash deploy.sh -n husein -m recreate husein:~/r8/r8/go-Influxdb-simple-app/k8s$ bash deploy.sh -n husein -m apply ------------------------------------------------------- Using mode: apply on resources... ------------------------------------------------------- ------------------------------------------------------- Applying InfluxDB resources... ------------------------------------------------------- secret/influxdb-auth-demo created service/influxdb-demo created statefulset.apps/influxdb-demo created Waiting for InfluxDB pod to be ready... Waiting for InfluxDB pod to be ready... Waiting for InfluxDB pod to be ready... Waiting for InfluxDB pod to be ready... Waiting for InfluxDB pod to be ready... InfluxDB pod is ready! job.batch/influxdb-set-authentication created service/go-app created deployment.apps/go-app created ———————————————————————————
Let’s see If everything runs as expected. You can use Postman to send a request to the ExternalDNS:APP_PORT.
As we can see our simple go application for writing and reading temperatures into InfluxDB is working as expected.
Wrapping up – Is that it?
As for the basic InfluxDB setup, yes that’s it. Of course, we could do a lot more to upgrade our little example, but this should get you going. Our example covered only a little segment of the things both of these technologies have to offer. Now, there is a common question of whether or not you should use InfluxDB. Well, it depends. There are some fields in which InfluxDB might be an excellent idea, and somewhere it wouldn’t. It all depends on the data we want to store. If we have some time-sensitive data, if we want to do some kind of monitoring, then sure it might be a really good choice.
Like any other database, InfluxDB has a field of applications that suit it best. There is no rule that tells us that we cannot use InfluxDB with data that isn’t time-sensitive, but we should choose the right database for the data that we are storing not the other way around.
Iím impressed, I have to admit. Rarely do I come across a blog thatís equally educative and interesting, and let me tell you, you have hit the nail on the head. The issue is something which too few folks are speaking intelligently about. Now i’m very happy I stumbled across this in my search for something concerning this.