Workloads Data

This topic discusses the Workloads Overview page and helps you understand its gauge charts and the data displayed on them.

About Workloads Overview

Workloads, in Kubernetes terminology, refers to your containerized applications. Workloads comprise of Deployments, Statefulsets, and Daemonsets within a Namespace.

In a Cluster, worker nodes run your application workloads, whereas the master node provides the core Kubernetes services and orchestration for application workloads. The Workloads Overview page provides the key metrics indicating health, capacity, and compliance.

workloads_overview.png

Interpret the Workloads Data

This topic gives insight into the metrics displayed on the Workloads Overview page.

Pod Restarts

The chart displays the latest value returned by sum(timeAvg(kubernetes.pod.restart.rate)).

What Is It?

The sparkline shows the trend of Pod Restarts rate across all the pods in a selected Workload. The number shows the most recent rate, expressed in Restarts per Second.

For instance, the image below shows the trend for the last hour. The number indicates that the rate of pod restarts is less than 0.01 for the last 10 minutes.

pod_restart_workload.png

For reference, the sparklines show the following number of steps (sampling):

  • Last hour: 6 steps, each for a 10-minute time slice.

  • Last 6 hours: 12 steps, each for a 20-minute time slice.

  • Last day: 12 steps, each for a 2-hour time slice.

What to Expect?

A healthy pod will have 0 restarts at any given time.

What to Do Otherwise?

In most cases, fewer restarts in the last hour (or larger time windows) do not indicate a serious problem. Drill down to the Kubernetes Overview Dashboard related to the Workload in Explore. For example, Kubernetes StatefulSet Overview provides a detailed trend broken down by pods.

pods_health_workload.png

In this example, the number of restarts is constant (roughly every 5 minutes) and no pods are ready. This might indicate a crash loop back-off .

Pods Available vs Desired

The chart shows the latest value of returned by sum(avg(kubernetes.deployment.replicas.available)) / sum(avg(kubernetes.deployment.replicas.desired)).

What Is It?

The chart displays the ratio between available and desired pods, averaged across the selected time window, for all the pods in a given Workload.

The upper bound shows the number of desired pods in the Workload.

For instance, the image below shows all the 42 desired pods are available.

pod_health_workload.png

What to Expect?

You should typically expect 100%.

If certain pods take a significant amount of time to become available (image pull time, pod initialization, readiness probe), then you may temporarily see a ratio lower than 100%.

What to Do Otherwise?

Determine the Workloads that have low availability by drilling down to the related Dashboard in Explore. For example, the Kubernetes Deployment Overview helps understand the trend and the state of the pods.

workload_pods_health.png

For instance, the image above shows that the ratio is 98% (3.93 / 4 x 100). The slight decline is due to an update that caused pods to be terminated and consequently to be started with a newer version.

CPU Used vs Requests

The chart shows the latest value returned by sum(avg(cpu.cores.used)) / sum(avg(kubernetes.pod.resourceRequests.cpuCores)).

What Is It?

The chart shows the ratio between the total CPU usage across all pods of a selected Workload and the total CPU requested by all the pods.

The upper bound shows the total CPU requested by all the pods. The value denotes the number of CPU cores.

workload_cpu.png

In this image, the pods in the Workload requests for 40 CPU cores, of which 43% is actually used (about 17 cores).

What to Expect?

It depends on the type of workload.

For applications (background processes) whose resource usage is constant, expect the ratio to be around 100%.

For "bursty" applications, such as an API server, expect the ratio to be lower than 100%. Note that the value is averaged for the selected time window, therefore, a usage spike would be compensated by an idle period.

Generally, values between 80% and 120% are considered normal. Values that are higher than 100% deemed normal if it's observed only for a relatively short time.

What to Do Otherwise?

  • A low usage indicates that the application is not properly running (not executing the expected functions) or the Workload configuration is not accurate (requests are too high compared to what the pods actually need).

  • A high usage indicates that the load is high for applications or the Workload configuration is not accurate (low requests compared to what the pods actually need).

In either case, drill down to the Kubernetes Overview Dashboard corresponding to the Workload in Explore. For example, the Kubernetes Deployment Overview Dashboard provides insight into resource usage and configuration.

Can the Value Be Higher than 100%?

Yes, it can.

  • Configuring CPU requests without limits or requests lower than limits is permissible. In these cases, you are allowing the containers to use more resources than requested, typically to handle temporary overloads.

  • Consider a Workload with two containers. Say, one container is configured to request for 1 CPU core and uses 1 CPU core (Used vs Request ratio is 100%). The other is configured without any request and uses 1 CPU core. In this example, the 2 CPU core Used to 1 CPU core Requested ratio is 200% at the Workload level.

What Does “No Data” Mean?

If the Workload is configured with no requests and limits, then the Usage vs Requests ratio cannot be computed. In this case, the chart will show “no data”. Drill down to the Dashboard in Explore to evaluate the actual usage.

You must always configure requests. Setting requests helps to detect Workloads that require reconfiguration.

Note

Kubernetes itself might expose Workloads with no requests or limits configured. For example, the kube-system Namespace can have Workloads without requests configured.

Memory Used vs Requests

The chart shows the latest value returned by sum(avg(memory.bytes.used)) / sum(avg(kubernetes.pod.resourceRequests.memBytes)).

What Is It?

The chart shows the ratio between the total memory usage across all the pods in a Workload and the total memory requested by the Workload.

The upper bound shows the total memory requested by all the pods, expressed in the specified unit of bytes.

memory_usage_namespace.png

For instance, the image shows that the pods in the selected Workload requested for 120 GiB, of which 24% is actually used (about 29 GiB).

What to Expect?

The type of Workload determines the ratio. Values between 80% and 120% are considered normal. Values that are higher than 100% is deemed normal if it's observed only for a relatively short time.

What to Do Otherwise?

A low memory usage indicates that the application is not properly running (not executing the expected functions) or the Workload configuration is not accurate (requests are too high compared to what the pods actually need).

A high memory usage indicates that the load is higher for applications or the Workload configuration is not accurate (low requests compared to what the pods actually need).

Note

Given the configured limits for the Workloads and the memory pressure on the nodes, if the Workloads use more memory than what’s requested they are at risk of eviction. For more information, see Container's Memory Limit.

In either case, drill down to the Workloads page to determine the Workload that requires a deeper analysis.

Can the Value Be Higher than 100%?

Yes, it can.

  • Configuring memory requests without limits or requests lower than limits is permissible. In these cases, you are allowing the containers to use more resources than requested, typically to handle temporary overloads.

  • Consider a Workload with two containers. Say, one container is configured to request for 1 GiB of memory and uses 1 GiB (Used vs Request ratio is 100%), while the other is configured without any request and uses 1 GiB of memory. In this example, the 2 GiB of memory used to 1 GiB requested ratio is 200% at the Workload level.

What Does “No Data” Mean?

If the Workload is configured with no memory requests and limits, then the Usage vs Requests ratio cannot be computed. In this case, the chart will show “no data”. Drill down to the Dashboard in Explore to evaluate the actual usage.

You must configure requests. It helps to detect Workloads that require reconfiguration.

Note

Kubernetes itself might expose Workloads with no requests or limits configured. For example, the kube-system Namespace can have Workloads without requests configured.

Network I/O

The chart shows the latest value returned by avg(avg(net.bytes.total)).

What Is It?

The sparkline shows the trend of network traffic (inbound and outbound) for the Workload. The number shows the most recent rate, expressed in bytes per second in a specific unit.

network_usage_namespace.png

For reference, the sparklines show the following number of steps (sampling):

  • Last hour: 6 steps, each for a 10-minute time slice

  • Last 6 hours: 12 steps, each for a 30-minute time slice

  • Last day: 12 steps, each for a 2-hour time slice

What to Expect?

The type of application runs in the Workload determines the metrics. Drill down to the Kubernetes Overview Dashboard corresponding to the Workload in Explore. For example, the Kubernetes Deployment Overview Dashboard provides additional details, such as network activity across pods.