Kubernetes Monitoring Tutorial

If you've been using Kubernetes for a while, you've almost certainly thought about choosing a tool to monitor the state of your cluster nodes. This article is just about that: we will tell you about the available tools and how to work with them. We will look at how to monitor Kubernetes with VictoriaMetrics and Prometheus + Grafana and show you how to avoid common problems.

Features of Kubernetes monitoring

Any Kubernetes cluster consists of worker nodes, which are individual servers running applications. The nodes run under the control of the Kubernetes Control Plane master server. Monitoring includes tracking the state of the master server itself and various system components. However, there are no built-in solutions for monitoring the state of individual nodes that make up the cluster.

It's true that Kubernetes Control Plane has a Kubelet module that provides information about the state of the API server, the main Kubernetes Control Plane node, and CPU and RAM consumption by containers. However, collecting metrics from user applications has certain difficulties. Applications are constantly updated, which entails the need to update configuration files. It is time-consuming and hardly possible for large applications. That is why we have to use third-party tools, which we'll discuss further in this article.

Monitoring a Kubernetes cluster with VictoriaMetrics

VictoriaMetrics is a time series database (TSDB) compatible with the Kubernetes API. It works on a subscription basis, responding to changes in pods (a pod is a set of containers sharing a common namespace).

To install the latest version of VictoriaMetrics, open this page on GitHub, scroll down, and select the version for your operating system. Next, unzip the .gz archive, and you can start running it. There are quite a lot of startup parameters, so we recommend reading the documentation. Now, let's check the main VictoriaMetrics tools that will help us with Kube monitoring.

The relabel_config, a set of sequential rules, is used to monitor the state of the pod. It consists of the following 6 lines:

source_labels, where you can specify the list of elements (labels) to perform certain operations on;
action is the action to be performed on labels (e.g. replace, keep, drop);
modulus is a rarely used hashmod module value, required to work with targets collected as a result of monitoring (for example, to scale them);
regex is a regular expression for matching, and the values specified in this string can be used to replace the label value;
separator is a separator for source_labels;
target_label is where you specify the label to which the result of the action will be written.

The above rules are used for actions with labels (e.g., they can be added, deleted, and their names and values can be changed) and filtering of detected targets.

As for actions, VictoriaMetrics supports all actions available for Prometheus (we will talk about it below). Besides the already mentioned actions replace, keep, drop, there are also:

labelmap
labelkeep
labeldrop
hashmod

In addition, VictoriaMetrics has its own set of unique actions:

replace_all
keep_metrics
drop_metrics
labelmap_all
keep_if_equal
drop_if_equal

That's how it works:

  scrape_configs:
- job_name: k8s-pods
  kubernetes_sd_configs:
  - role: pod
  relabel_config: 
  - source_labels: [_meta_kubernetes_pod_namespace]
  - action: keep
  - regex: ^default$

This entry means that we only store pods that are in the default namespace, as indicated by the expression in the regex line.

VictoriaMetrics also supports selecting objects from the Kubernetes API according to their role, which, again, can be assigned by TSDB tools. The roles can be:

Pod. Used to return a list with labels for each container port with an IP pod.
Service. Objects with this role represent subscriptions to changes in specific services. For each port of a specific service, a label is created with metadata like server_name.namespace.svc and port number.
Endpoints. Used to link services and pods, displaying the latter's state. In addition, Endpoints direct traffic only to currently up-and-running pods.
Endpointslice. Same as Endpoints, but with modified label prefix names. Endpointslice is used when the number of endpoints is large (more than a thousand) so that Kubernetes does not cut them off.
Node. That's your node with the relevant information.
Ingress. This role is used only for monitoring. It writes the host name into the address, and the path into a separate variable.

These roles are assigned using the role line (shown in the code above).

Also, you will probably need to filter pods to collect metrics only from selected pods. This can be implemented with another useful tool, Prometheus.

How to set up monitoring via Prometheus + Grafana

Strictly speaking, Prometheus is a whole set of utilities that often comes with Grafana, a web interface that visualizes what Prometheus collects. You can install this suite in several ways. For example, with helm. After cloning the repository, enter the following commands:

cd charts/stable/prometheus-operator
helm dependency update

And then:

chelm install --name prometheus --namespace monitoring prometheus-operator

The Prometheus pod contains two useful plugins: config-reloader for tracking changes and reloading the prometheus.yaml configuration file, and rules-configmap-reloader for tracking changes in Prometheus rules.
The Alertmanager pod contains the manager itself, designed to automatically create notifications for specified rules, and config-reloader, a plugin for reloading the manager in case of changes in the configuration file.
In the Grafana pod, you will find the web interface that takes data for mapping from the Prometheus database and the Grafana-sc-dashboard for describing ConfigMap resources and generating JSONs.

After launching the pods, enter the following commands:

kubectl port-forward prometheus-prometheus-prometheus-oper-prometheus-0 9090:9090

Open http://localhost:9090 in a browser. To view the list of collected metrics, enter:

kubectl get servicemonitors.monitoring.coreos.com

In the previous chapter, we wrote that Prometheus helps filter pods. To do this, we must write the following directives in the scrape_config.yaml:

relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
  action: keep
  regex: true
- source_labels: [__meta_kubernetes_namespace]
  action: replace
  target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
  action: replace
  target_label: kubernetes_pod_name

In addition, you will need to specify the port number by creating a string like this:

prometheus.io/port: <port_number>

And the path where the metrics will be fetched:

prometheus.io/path: <path>

With Prometheus, monitoring almost any object, including servers, is easy, and the OS installed (Windows or Linux) does not matter. Among the parameters you can monitor are, for example, RAM, CPU and disk space usage. As to applications, you can monitor the number of errors, the level of requests, and the time of their execution.

You don't need additional settings for receiving metrics from some services, while special scripts called exporters are used to export the rest. They are already configured, so you only need to find and install the right one from here. All metrics are stored in TSDB, and visualized through Grafana using an API that sends requests to TSDB. Of course, Grafana is not the only way to visualize Prometheus metrics, but it is probably the most convenient.

Solving the single port problem

When you need to collect metrics from multiple ports with the Prometheus configuration, a problem may arise because prometheus.io/port can only return a single port. You can use different workarounds here—for example, those offered by VictoriaMetrics.

It has a tool called VictoriaMetrics Operator that uses Custom Resource Definitions to extend the Kubernetes API. This tool has extensive documentation describing all Custom Resources in detail. We are interested in VMPodScrape, which is launched as follows:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMPodScrape
metadata:
  name: vm
  namespace: monitoring
spec:
  podMetricsEndpoints:
  - port: http
  - port: backup-metrics
  selector:
    matchLabels:
      app.kubernetes.io/name: vm

Note that we have specified two ports. But you can enable multi-port collection in a different way, using the familiar rule sets:

scrape configs
- job name: apiserver/0
  kubernetes_sd_configs:
  - role: endpoints
  scheme: http
  relabel_configs:
 …
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_component
    regex: apiserver
  - action: keep
    source_labels:
  - __meta_kubernetes_endpoint_port_name
    regex: https
- job name: coredns/0
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
 …
  - action: keep
    source_labels:
    - __meta_kubernetes_service_label_app
    regex: coredns
  - action: keep
    source_labels:
  - __meta_kubernetes_endpoint_port_name
    regex: http-metrics

This approach of creating separate blocks for applications in the config file is also quite workable. So choose the solution you prefer.

Prometheus vs VictoriaMetrics

Both VictoriaMetrics (VM) and Prometheus are convenient tools for working with metrics. But which one is more cost-effective? Such tests were conducted on Google Compute Engine at the end of 2020, and it turned out that:

VM is 7 times more economical in terms of stored data space;
disk read bursts are 6.3 times higher on Prometheus;
VM is 5.3 times more economical in terms of RAM usage (4.3 vs. 23 Gb for Prometheus, which it needs for stable operation).

As we can see, VictoriaMetrics allows you to significantly save on hardware.