Nodes Data

This topic discusses the Nodes Overview page and helps you understand its gauge charts and the data displayed on them.

About Nodes Overview

A node refers to a worker machine in Kubernetes. A physical machine or VM can represent a node. The Nodes Overview page provides key metrics indicating the health, capacity, and compliance of each node in your cluster.

nodes_overview.png

Interpret the Nodes Data

This topic gives insight into the metrics displayed on the Nodes Overview page.

Node Ready Status

The chart shows the latest value returned by avg(min(kubernetes.node.ready)).

What Is It?

The number expresses the Node readiness to accept pods across the Cluster. The numeric availability indicates the percentage of time the Node is reported ready by Kubernetes. For example:

  • 100% is displayed when a Node is ready for the entire time window, say, for the last one hour.

  • 95% when the Node is ready for 95% of the time window, say, 57 out of 60 minutes.

The bar chart displays the trend across the selected time window, and each bar represents a time slice. For example, selecting “last 1 hour” displays 6 bars, each indicating a 10-minute time slice. Each bar shows the availability across the time slice (green) and the unavailability (red).

For instance, the image below indicates the Node has not been ready for the entire last 1-hour time window:

nodes_not_ready.png

What to Expect?

The chart should show a constant 100% at all times.

What to Do Otherwise?

If the number is less than 100%, review the status reported by Kubernetes. Drill-down to the Kubernetes Node Overview Dashboard in Explore to see details about the Node readiness:

nodes_availability_state.png

If the Node Ready Status has an alternating behavior, as shown in the image, the node is flapping. Flapping indicates that the kubelet is not healthy. See specific conditions reported by Kubernetes that would help determine the causes for the Node not being ready. Such conditions include network issues and memory pressure.

Pods Ready vs Allocatable

The chart reports the latest value of sum(avg(kubernetes.pod.status.ready)) / avg(avg(kubernetes.node.allocatable.pods)).

What Is It?

It is the ratio between available and allocatable pods configured on the node, averaged across the selected time window.

Note

The Clusters page includes a similar chart named Pods Available vs Desired. However, the meaning is different:

  • The Pods Available vs Desired chart for Clusters highlights how many pods you expect and how many are actually available. See IsPodAvailable for a detailed definition.

  • The Pods Ready vs Allocatable chart for Nodes indicates how many pods can be scheduled on each Node and how many are actually ready.

The upper bound shows the number of pods you can allocate in the node. See node configuration.

For instance, the image below indicates that you can allocate 110 pods in the Node (default configuration), but only 11 pods are ready:

pods_ready_allocatable.png

What to Expect?

The ratio does not relate to resource utilization, but it measures the pod density on each node. The more pods you have on a single node, the more effort the kubelet has to put in order to manage the pods, the routing mechanism, and Kubernetes overall.

Given the allocatable is properly set, values lower than 80% indicate a healthy status.

What to Do Otherwise?

Refer to the Requests vs Allocatable ratio for CPU and memory and get insight into the resource utilization of your Workloads. The ratio indicates whether your cluster is healthy or not. If the value is high (close to 100%), you might want to consider:

  • Reviewing the default maximum pods configuration of the kubelet to allow more pods, especially if the CPU and memory utilization is healthy.

  • Adding more nodes to allow for more pods to be scheduled.

  • Reviewing kubelet process performance and Node resource utilization in general. A higher ratio indicates high pressure on the operating system and for Kubernetes itself.

CPU Requests vs Allocatable

The chart shows the latest value returned by sum(avg(kubernetes.pod.resourceRequests.cpuCores)) / sum(avg(kubernetes.node.allocatable.cpuCores)).

What Is It?

The chart shows the ratio between the number of CPU cores requested by the pods scheduled on the Node and the number of cores available to pods. The upper bound shows the CPU cores available to pods, which corresponds to the user-defined configuration for allocatable CPU.

For instance, the image below shows that the Node has 16 CPU cores available, out of which, 84% are requested by the pods scheduled on the Node:

cpu_req_al.png

What to Expect?

Expect a value up to 80%.

Assuming all the nodes have the same amount of allocatable resources, a reasonable upper bound is the value of (node_count - 1) / node_count x 100. For example, 90% if you have 9 nodes. Having a high ratio protects your system against a Node becoming unavailable.

What to Do Otherwise?

  • A low ratio indicates the Node is underutilized. Drill up to the corresponding cluster in the Clusters page to determine whether the number of pods currently running is lower, or if the pods cannot run for other reasons.

  • A high ratio indicates a potential risk of being unable to schedule additional pods on the Node.

    Drill down to the  Kubernetes Node Overview Dashboard to evaluate what Namespaces, Workloads, and pods are running. Additionally, drill up in the Clusters page to evaluate whether you are over-committing the CPU resource. You might not have enough resources to fulfill requests, and consequently, pods might not be able to run on the Node. Consider adding Nodes or replacing Nodes with additional CPU cores.

Can the Value Be Higher than 100%?

Kubernetes schedules pods on Nodes where sufficient allocatable resources are available to fulfill the pod request. This means Kubernetes does not allow having a total request higher than the allocatable. Consequently, the ratio cannot be higher than 100%.

Over-committing (pods requesting resources higher than the capacity) results in a high Requests vs Allocatable ratio and a low Pods Available vs Desired ratio at the Cluster level. What it indicates is that most of the available resources are being used, consequently, what’s available is not sufficient to schedule additional pods. Therefore, Pods Available vs Desired ratio will also decrease.

Memory Requests vs Allocatable

The chart highlights the latest value returned by sum(avg(kubernetes.pod.resourceRequests.memBytes)) / sum(avg(kubernetes.node.allocatable.memBytes)).

What Is It?

The ratio between the number of bytes of memory is requested by the pods scheduled on the node and the number of bytes of memory available.The upper bound shows the memory available to pods, which corresponds to the user-defined allocatable memory configuration.

For instance, the image below indicates the node has 62.8 GiB of memory available, out of which, 37% is requested by the pods scheduled on the Node:

mem_req_allocatable.png

What to Expect?

A healthy ratio falls under 80%.

Assuming all the nodes have the same amount of allocatable resources, a reasonable upper bound is the value of (node_count - 1) / node_count x 100. For example, the ratio is 90% if you have 9 nodes. Having a high ratio protects your system against a node becoming unavailable.

What to Do Otherwise?

  • A low ratio indicates that the Node is underutilized. Drill up to the corresponding cluster in the Clusters page to determine whether the number of pods running is low, or if pods cannot run for other reasons.

  • A high ratio indicates a potential risk of being unable to schedule additional pods on the node.

    • Drill down to the  Kubernetes Node Overview dashboard to evaluate what Namespaces, Workloads, and pods are running.

    • Additionally, drill up in the Clusters page to evaluate whether you are over-committing the memory resource. Consequently, you don’t have enough resources to fulfill requests, and pods might not be able to run. Consider adding nodes or replacing nodes with more memory.

Can the Value be Higher than 100%?

Kubernetes schedules pods on nodes where sufficient allocatable resources are available to fulfill the pod request. This means Kubernetes does not allow having a total request higher than the allocatable. Consequently, the ratio cannot be higher than 100%.

Over-committing (pods requesting for more resources than that are available) results in a high Requests vs Allocatable ratio at the Nodes level and a low Pods Available vs Desired ratio at the Cluster level. What it indicates is that most of the resources are being used, consequently, what’s available is not sufficient to schedule additional pods. Therefore, Pods Available vs Desired ratio will also decrease.

Network I/O

The chart shows the latest value returned by avg(avg(net.bytes.total)).

What Is It?

The sparkline shows the trend of network traffic (inbound and outbound) for a Node. The number indicates the most recent rate of restarts per second.

network_io.png

For reference, the sparklines show the following number of steps (sampling):

  • Last hour: 6 steps, each for a 10-minute time slice

  • Last 6 hours: 12 steps, each for a 20-minute time slice

  • Last day: 12 steps, each for a 2-hour time slice

What to Expect?

The metric highly depends on what type of applications run on the Node. You should expect some network activity for Kubernetes related operations.

Drilling down to the Kubernetes Node Overview Dashboard in Explore will provide additional details, such as network activity across pods.