A couple of months ago I had my first encounter with InfluxDB. I found it very interesting from the very start because its key concepts are different from the ones used in the SQL or MongoDB databases, in which I had some experience. The main obstacle for any programmer is, in my opinion, a lack of helpful resources available online. This is the very problem with InfluxDB. So, here I will attempt to make it somewhat easier to deploy and understand how the deployment of InfluxDB is done on Kubernetes.

This blog will be divided into three parts:

  • Introduction to InfluxDB
  • Deployment and resources 
  • Wrap up with a few words

So, let’s get started!

About InfluxDB

To be able to properly define InfluxDB let’s first define the data it usually stores. Time series data is a sequence of data points, typically consisting of successive measurements, over some time interval. So, InfluxDB is an open-source database optimized for the time series data stage. With this said, it is easy to assume that every piece of data inside InfluxDB has an exact time when it is measured, or at least a time when it is written into the database. InfluxDB is made to work with a high load of point writes and point reads. This makes it a very good choice if we want to set up some kind of monitoring where time precision is of great importance.

Key concepts and data elements

There are certain elements that all data inside of InfluxDB consists of. Below is a simple description of all of them.

Timestamp – the time at which our measure is written into InfluxDB

Fieldset – a set of key-value pairs (field_name and field_value). At least one field is necessary so the data we wish to save into InfluxDB is valid. A valid type for a field value is a string, float, integer, and boolean.

Tag set – a set of key-value pairs (tag_name and tag_value). Unlike fields, tags are indexed. That means querying with tags is faster than querying with fields. So, tags contain commonly queried data. Tag values can only be strings.

Measurements – are the place where we store the elements above. The measurement name should describe the data which is stored in it. 

Series – a collection of points that share a measurement, tag set, and field key.

Buckets – buckets are containers for all the elements above. Each bucket has a retention policy that serves as lifecycle management. Basically, it defines the lifespan of the data inside the bucket.

Organization – consists of buckets and their users.

The next diagram shows how some of these elements are related. 

Simple GO-InfluxDB application

When learning new things, I always find it easier if I have something I can run. So, I prepared a simple application that would have helped me when I was starting with InfluxDB. All the code can be found here huseincausevic-abh along with instructions on how to run it.

The application goal was to deploy a simple API written in Go programming language along with InfluxDB and to ensure that communication with InfluxDB is established. 

Before we begin, to follow these examples and to be able to deploy the application on Kubernetes you should have a Kubernetes cluster and Kubectl command-line tool to communicate with the cluster. Now, with that said, we can deploy our application.

Deploying InfluxDB to Kubernetes

To successfully deploy InfluxDB we have to write a couple of resources. Since, InfluxDB is a database, at any time it has some kind of state which must be persisted. For this purpose, we are using the StatefulSet Kubernetes resource. It grants unique network identifiers and stable persistent storage for all the pods defined in the manifest file, which is useful if we want to scale our application later on.

Our high-level goal is to:

  • Have one InfluxDB instance running;
  • Ensure that traffic is possible as soon as the InfluxDB pod is up;
  • Perform periodical health checks to see if everything is running as desired/expected.

The manifest file that describes the bullets above is:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: influxdb-demo
  name: influxdb-demo
spec:
  replicas: 1
  selector: 
    matchLabels:
      app: influxdb-demo
  serviceName: influxdb-demo
  template:
    metadata:
      labels:
        app: influxdb-demo
    spec:
      containers:
        - image:  quay.io/influxdb/influxdb:2.0.0-beta
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /health
              port: api
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: influxdb-demo
          ports:
            - containerPort: 9999
              name: api
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /health
              port: api
              scheme: HTTP
            initialDelaySeconds: 5
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1

Here we said that we want to build our InfluxDB container from the quay.io/influxdb/influxdb:2.0.0-beta image that is located on Red Hat’s images repository.

The readiness and Liveness probe ensures the second and third bullets respectively. Both probes are functioning in a similar way. The liveness probe is performed periodically after the pod is marked as running, causing the pod to restart if something is not as expected. The readiness probe, on the other hand, will not restart the pod but it will remove the endpoint from the InfluxDB Kubernetes service which points to that specific pod.

InfluxDB has a health check defined on the path /health that will tell us whether it is running correctly or not, so our probe success statuses are based on its return value. If this probe fails we would get something like this after describing the influxdb-demo pod:

$ kubectl describe pod -n husein influxdb-demo-0

Name:         influxdb-demo-0
Namespace:    husein
Priority:     0
. . .
  Warning  Unhealthy  58s        kubelet, .ec2.internal  Readiness probe failed: Get http://10.0.10.57:9999/health: dial tcp 10.0.10.57:9999: connect: connection 

Now if we grab the stateful set manifest file and apply it to the cluster, we should be able to see that it is running:

husein:~/r8/r8/go-Influxdb-simple-app/k8s$ kubectl apply -f influxdb-statefulset.yaml -n husein

statefulset.apps/influxdb-demo created

husein:~/r8/r8/go-Influxdb-simple-app/k8s$ kubectl get pods -n husein | grep influxdb-demo
influxdb-demo-0                                    1/1     Running     0          26s

The next thing we need to define is the InfluxDB service, so we can communicate with the created pod.

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: influxdb-demo
  name: influxdb-demo
spec:
  type: NodePort
  ports:
    - name: api
      port: 9999
      targetPort: 9999
      nodePort: 31234
  selector:
    app: influxdb-demo

This service is going to target port 9999 of all the pods that have the label app: influxdb-demo defined, which our pod in the previously created resource has. In the example above we also used the NodePort service type. This is usually only used for dev purposes, but for this example, it is good to help demonstrate what we achieved so far. 

Let’s apply it to the cluster so we can see.

husein:~/r8/r8/go-Influxdb-simple-app/k8s$ kubectl apply -f influxdb-service.yaml -n husein

service/influxdb-demo created

husein:~/r8/r8/go-Influxdb-simple-app/k8s$ kubectl get svc -n husein | grep influxdb-demo
influxdb-demo                      NodePort       172.20.223.47                                                    9999:31234/TCP   14s

After we defined the service, we can get the ExternalDNS so we can access our InfluxDB through the web browser. 

# get nodes name
$ kubectl get nodes | awk ‘{ print $1 }’

# copy one and do
$ kubectl describe node  | grep ExternalDNS

After that, you should be able to see the Chronograf dashboard for InfluxDB data. Chronograf is the user interface and administrative component of InfluxDB. We can access it on URL: ExternalDNS:3124. You will be able to see this:

Great, we successfully deployed InfluxDB. Before we can continue with setting up our InfluxDB instance, let’s go over Go application resources.

GO application resources

The service resource for this application is almost identical to the one we wrote for InfluxDB. So, I won’t go over it here. As I said, this Go application is a simple API that only processes data it gets, and then sends the processed request to InfluxDB. This means this is a stateless application. For this purpose, we will use the Deployment resource instead of the StatefulSet that we used for InfluxDB. 

Here is the resource definition:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-app
  labels:
    app: go-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: go-app
  template:
    metadata:
      labels:
        app: go-app
    spec:
      containers:
      - image: hcausevic5/go-influxdb-simple-app
        name: go-app
        imagePullPolicy: Always
        ports:
        - containerPort: 4444
        volumeMounts:
        - name: influx-creds
          mountPath: /app/influxdb
          readOnly: true
      volumes:
      - name: influx-creds
        secret:
          secretName: influxdb-auth-demo

It should be pretty clear what we want to achieve here. We will have one pod (replicas: 1) and that pod will have one container built from go-influxdb-simple-app image on my DockerHub account. We will put it under the app-service with the label: app: go-app and it should receive our requests on port 4444. There are also volume mounts that I find interesting. Why would we need that?

Further InfluxDB setup

Basically, before we can do anything with our InfluxDB instance we need to do a setup. The setup will define the initial user, bucket and organization. Along with a bunch of other stuff, in return, we will get an authentication token which we need to do writes, queries, etc. In our Go code, we need that token so we can communicate with the InfluxDB instance. Of course, it is possible to get into the InfluxDB container or do a manual setup with Chronograf’s dashboard UI but then we will also need to manually change it in Go code. For that purpose, we will define another resource: Secret. It looks something like this:

---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: influxdb-auth-demo
data:
  url: aHR0cDovL2luZmx1eGRiLWRlbW86OTk5OQ==
  username: aHVzZWlu
  password: aHVzZWluMTIz
  org: bXktb3Jn
  bucket: bXktYnVja2V0

This secret contains all the necessary data we need to successfully set up InfluxDB. But what are those hieroglyphs, you might ask. Well, Kubernetes’ secrets require the data value field to be base64 encoded. Encoding and decoding can be performed like this:

$ echo -n http://influxdb-demo:9999 | base64 -w 0

aHR0cDovL2luZmx1eGRiLWRlbW86OTk5OQ==

$ echo -n aHR0cDovL2luZmx1eGRiLWRlbW86OTk5OQ== | base64 —decode

http://influxdb-demo:9999

We introduced this resource just for the token, and it hasn’t even shown up? Well, not yet. As I already said the token is just a piece of data that we get after setting up InfluxDB, which we need for communication through the Go client library for InfluxDB.  We don’t know what this value will be, and we don’t need to know if we automate the process of setting up InfluxDB and saving the token.

For automation purposes, we will be building another resource: Job. If we look up InfluxDB API docs, we can see that API Endpoint (/api/v2/setup) can be used for this. The job of the Job (:D) is to set up our InfluxDB instance using the values from the previously created secret and then patch the secret with the token value it gets in return.

This secret will be mounted into the /app/influxdb directory of the pod. The next code shows us how we can extract that data in our GO code:

func mountedConnectionParameters() map[string]string {
    connectionParams := make(map[string]string)
    basePath := "/app/influxdb"
    files, err := ioutil.ReadDir(basePath)
    if err != nil {
        panic(err)
    }
    for _, file := range files {
        if strings.HasPrefix(file.Name(), ".") == false {
            fileContent, err := ioutil.ReadFile(fmt.Sprintf("%v/%v", basePath, file.Name()))
            if err != nil {
                logrus.Errorf("Could not read file %v", file.Name())
            }
            connectionParams[file.Name()] = string(fileContent)
        }
    }
    return connectionParams
}

The function above will return all the connection parameters we used to set up our InfluxDB, and with that data, we can successfully send our queries to InfluxDB using the influxdb-client-go package.

Also, since we already made some effort to automate the InfluxDB setup, we can automate the whole setup. This will be made with bash script: deploy.sh. This script can be run with two parameters: (-n) namespace, (-m, apply delete recreate) mode.

$ bash deploy.sh -n husein -m recreate
husein:~/r8/r8/go-Influxdb-simple-app/k8s$ bash deploy.sh -n husein -m apply

-------------------------------------------------------
Using mode: apply on resources...
-------------------------------------------------------
-------------------------------------------------------
Applying InfluxDB resources...
-------------------------------------------------------
secret/influxdb-auth-demo created
service/influxdb-demo created
statefulset.apps/influxdb-demo created
Waiting for InfluxDB pod to be ready...
Waiting for InfluxDB pod to be ready...
Waiting for InfluxDB pod to be ready...
Waiting for InfluxDB pod to be ready...
Waiting for InfluxDB pod to be ready...
InfluxDB pod is ready!
job.batch/influxdb-set-authentication created
service/go-app created
deployment.apps/go-app created
———————————————————————————

Let’s see If everything runs as expected. You can use Postman to send a request to the ExternalDNS:APP_PORT.

As we can see our simple go application for writing and reading temperatures into InfluxDB is working as expected.

Wrapping up – Is that it?

As for the basic InfluxDB setup, yes that’s it. Of course, we could do a lot more to upgrade our little example, but this should get you going. Our example covered only a little segment of the things both of these technologies have to offer. Now, there is a common question of whether or not you should use InfluxDB. Well, it depends. There are some fields in which InfluxDB might be an excellent idea, and somewhere it wouldn’t. It all depends on the data we want to store. If we have some time-sensitive data, if we want to do some kind of monitoring, then sure it might be a really good choice.

Like any other database, InfluxDB has a field of applications that suit it best. There is no rule that tells us that we cannot use InfluxDB with data that isn’t time-sensitive, but we should choose the right database for the data that we are storing not the other way around.

One Comment

  • Iím impressed, I have to admit. Rarely do I come across a blog thatís equally educative and interesting, and let me tell you, you have hit the nail on the head. The issue is something which too few folks are speaking intelligently about. Now i’m very happy I stumbled across this in my search for something concerning this.

Leave a Reply