Kubernetes Dashboards

The Kubernetes * Health dashboards break down resource and performance metrics by various logical entities to allow for an in-depth analysis, and for critical issues to be identified and isolated. Each dashboard is built around the Golden Signals approach to monitoring: Latency, Traffic, Errors, and Saturation. Resource utilization metrics are oriented toward health and performance. These are aspects like CPU, memory, network, and storage usage by Kubernetes object. kube-state-metrics is about the status or count. Pairing kube-state-metrics with resource utilization metrics, each dashboard provides a comprehensive picture of what’s happening in your Kubernetes environment.



Use Cases

Kubernetes Horizontal Pod Autoscaler

Highlights minimum, maximum, current, and desired replicas.

  • Identify performance bottlenecks.

  • Identify whether there are enough available pods compared to the desired pods.

  • Use usage percentages over time to better estimate expansion capacity.

  • Locate logical entities that are consuming too many cluster resources, or that are rapidly trending upwards towards unsustainable levels.

  • Dive deeper into specific entities to identify the root cause of problems.

  • Use usage percentages over time to better estimate expansion capacity.

For example:

  • A deployment with no available pods indicates that the corresponding app is not serving requests. Getting a dashboard on this condition means you can visualize the metrics and spring into action to find and resolve the issue quickly.

  • Dropping the number of pods available and remaining below the desired number indicate that your application performance is degraded or not running at the redundancy required. With these metrics represented on the dashboard, you get a quick glance of the severity of the impact on your app's user experience.

  • A lower number of replicas running during an extended period of time than the number of replicas desired indicates a symptom of entities not working properly, such as nodes or resources unavailability, Kubernetes or Docker Engine failure, broken Docker images, and so on. No replicas for a deployment object could potentially mean that the app is down.

  • A continuous loop of pod restart (CrashLoopBackOff) might be associated with missing dependencies or unmet requirements, or insufficient resources. In CrashLoopBackOff, pods never get into ready status and therefore are counted as unavailable and down.

  • Use these three dashboards to provide a high-level overview of all aspects of the Kubernetes environment's performance and resource saturation status.

  • Set high-level alerts to narrow down areas of concern, before moving to the more in-depth dashboards.

  • Quickly identify major performance issues within each type of entity.

Kubernetes Resource Quota

Provides an overview of resource limit and request, and the number of replication controllers, services, service ports, service load balancers, configMap, and secrets.

Kubernetes Memory Allocation Optimization

Highlights Memory allocation optimization.

Kubernetes CPU Allocation Optimization

Displays CPU utilization of your Kubernetes environment.

Kubernetes Cluster Overview

Provides an overview of your Kubernetes cluster.

Kubernetes DaemonSet Overview

Overview of DaemonSet objects.

Kubernetes Deployment Overview

Highlights whether each deployment has a sufficient number of available pods and resources, and indicates the number of pods running, desired, or have been updated.

Kubernetes Job Overview

Overview of all the jobs and the performance information.

Kubernetes Namespace Overview

Displays metrics such as resource requests and resource limits at the namespace level; identifies the performance of the Kubernetes entities such as pods, deployments, DaemonSet, Statefulset, and jobs, and compliance with replicaSets specs. Highlights the number of services, deployments, replicaSets, and jobs per namespace.

Kubernetes Node Overview

Highlights the number of nodes that are ready, unavailable, or out of disk; the number of nodes that are under the memory, disk, or network pressure; compares allocatable capacity with requested capacity on the node; provides the number of pod resources of a node that are available for scheduling and the available capacity to serve the pods running on the nodes.

Kubernetes Pod Overview

Helps identify potential bottlenecks by graphing the number of container restarts, the number of pods waiting to be scheduled, resource utilization of containers within each pod and available capacity to serve pod requests, the number of available pods compared to the desired pods, and the number of pods in available state and ready to serve requests.

Kubernetes ReplicaSet Overview

Provides details such as the number of pods per replicaSet, the desired number of pods per replicaSet, and pods per replicaSet that are in a ready state.

Kubernetes StatefulSet Overview

Overview of the StatefulSet objects in your environment.

Kubernetes Cluster and Node Capacity

Highlights a comprehensive overview of the performance of the hosts or nodes that form the Kubernetes cluster, including CPU, memory, and file system usage, and network traffic.

Before analyzing the Dashboard, consider the following guidelines related to resource usage:

  • If Resource Limits is undefined for a container, Kubernetes does not default to a value.

  • if Resource Requests is unspecified for a container, Kubernetes defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Limits do not default to any value.

  • If both Resource Limits and Resource Requests are not specified, no matter which value had been defaulted by Kubernetes, kube-state-metric (and hence Sysdig Monitor) reports zero. Therefore, only user-defined requested are reported by the kubernetes.pod.resourceRequests.memByte metric.

  • The memory used by a container (the value returned by memory.used.bytes) can be greater than the memory requested by a pod (the value returned by kubernetes.pod.resourceRequests.memByte). This is permissible in Kubernetes because Requests value determines the minimum amount of resources required.

For these reasons, it can be deduced that

  • In some cases, the value of Used Resources will be more than that of Resource Requests and Resource Limits, and the value of Resource RequestS could be more than that of Resource Limit.

  • The value of kubernetes.pod.resourceRequests.memByte<=memory.used.bytes<=kubernetes.pod.resourceLimits.memByte

Kubernetes Health Overview

Provides a comprehensive overview of the performance of the entire Kubernetes environment, broken down by various logical entities and underlying resource availability and usage. This dashboard breaks down resource and performance kube-state-metrics by the logical Kubernetes entities, such as pods, namespaces, deployments, and replicaSets, containers, and so on.

Kubernetes Service Health

Displays the count, resource usage, performance, and limitations of services running in the Kuberenetes environment. The dashboard provides and overview of what resources each service is using, their response times, the container and request counts, and how the response times measure up against the resource utilization.

Kubernetes Workloads CPU Usage and Allocation

Displays resource utilization of your workloads. This dashboard helps you review the CPU usage of your workloads, making sure that the CPU is properly allocated in the Kubernetes environment. All the numbers in this dashboard are expressed in CPU cores.

Kubernetes Workloads Memory Usage and Allocation

Helps you review the memory usage of your workloads, making sure that the memory is properly allocated in the Kubernetes environment.

Kubernetes Service Golden Signals

Highlights the latency, traffic, errors, and saturation in your Kubernetes environment.

Last modified July 17, 2021: Aliases to old site urls (#98) (917a9be2)