Sysdig Documentation

Clusters Overview

In Kubernetes, a pool of nodes combine together their resources to form a more powerful machine, that is a Cluster. The Cluster Overview page provides key metrics indicating health, risk, capacity, and compliance of each cluster. Your cluster can reside in any cloud or multi-cloud environment of your choice.

Each row in the Clusters page represents a cluster. Clusters are sorted by the severity of corresponding events in order to highlight the area that needs attention. For example, a cluster with high severity events is bubbled up to the top of the page to highlight the issue. You can further drill down to the Nodes or Namespaces Overview page for investigating at each level.

cluster_overview.png

Scope

The scope of Cluster Overview is dictated by Cluster. You can select only a single Cluster, thus narrowing down the scope to the desired Cluster in your Kubernetes environment. By selecting a Cluster, you are scoping the events list to the scope of selection.

clusters_dropdown.png

Understanding Cluster Overview Metrics

Description

Color Scheme

Example

Metrics

Ready Status

Represented as Node Ready. Shows the latest average value of kubernetes.node.ready expressed as a percentage.

Red: Nodes are unhealthy. If you detect a red (node flapping or not ready for a certain amount of time), drill down to Nodes Overview and isolate the problem.

Green: Nodes are healthy.

Hover over the micro-charts in Overview to see corresponding metrics.

The number indicates the latest value returned by kubernetes.node.ready. For more information, see Clusters and Nodes Overview at Troubleshoot Infrastructure Issues With Overview.

The 12 blocks in the Nodes Ready Status represent 12 samples indicating the past status in a selected time window. For a selected period, say an hour, nodes are split into 12 different samples, each one indicating the status for 5 minutes.

80% indicates the latest value of the metrics across the cluster is 0.80.

The latest value returned by kubernetes.node.ready (min, avg)

Time aggregation is Minimum.

Group aggregation is Average.

Pod Guage

Represented as Pods Available vs Desired. This is the ratio between the total number of pods available and the total number of pods desired for the cluster across deployments, statefulSets, and DaemonSets.

Red: The number of available pods is less than the desired number. The ratio between available and desired pods is between 0-80%.

Yellow: The ratio of available and desired pods is 80 -95%.

Green: The ratio of available and desired pods is 95-100%.

Pods Available vs Desired should be 95-100% available.

94% indicates that out of 66 sample pods, approximately 62 pods are available. Therefore indicating in red.

The ratio of kubernetes.namespace.pod.available.count and kubernetes.namespace.pod.desired.count.

Time aggregation is Average.

Group aggregation is Sum.

CPU Guage

Represented as CPU Requested vs Allocatable. This is the ratio of the number of CPU cores used and the number of CPU cores requested by each deployment.

CPU gauge dictates the maximum amount of CPU that your container can use independent of contention on the node. The CPU request represents a minimum amount of CPU that a container can consume. If a container attempts to use more than the specified limit, the system throttles the container.

Red: The ratio between used and requested CPU cores is greater than 95%.

Yellow: The ratio of used and requested CPU cores is between 80%-95%.

Green: The ratio of used and requested CPU cores is less than 80%.

12% indicates out of 50 CPU cores that can be allocated (see image) only 6 CPU cores are requested.

The ratio of kubernetes.pod.resourceRequests.cpuCores and kubernetes.node.allocatable.cpuCores.

Time aggregation is Average.

Group aggregation is Sum.

Memory Guage

Represented as Memory Requested vs Allocatable. This is the ratio between the total number of memory used and the total number of memory requested in bytes by each deployment.

Red: The ratio of requested and allocatable memory in bytes is greater than 95%.

Yellow: The ratio of requested and allocatable memory in bytes is between 80%-95%.

Green: The ratio of requested and allocatable memory in bytes is less than 80%.

2% indicates out of 190 Gib memory (see image) that can be allocated, only 3.8 Gib is requested.

The ratio of kubernetes.pod.resourceRequests.memBytes and kubernetes.node.allocatable.memBytes.

Time aggregation is Average.

Group aggregation is Sum.

Events

Shows the severity level and the number of events for each type of severity—High, Low, Medium, and Info—in that order.

Red: The number of events that are in High severity state.

Orange: The number of events that are in Medium severity state.

Green: The number of events that are in Low severity state.

Blue: The number of events that are in the Info state.

Drill-Down Features

Kubernetes Cluster Overview

Takes to the Explore page for Cluster Overview. It shows Nodes availability, Pods health, Workloads overview, and an HTOP-like view of metrics like CPU, disk, memory, and network. It allows you to drill down into the environment. Color coding enables you to spot potential issues quickly.

Kubernetes Compliance Report

Record of CIS Kubernetes benchmark test.

For more information, see Compliance Dashboards and Metrics.

Red: The value is between 0-50%

Yellow: The value is between 50%-80%.

Green: The value is between 80-100%.

The value returned by compliance.k8s-bench.pass_pct.

Time aggregation is Average.

Group aggregation is Average.

Docker Compliance Report

Record of CIS Docker benchmark test.

For more information, see Compliance Dashboards and Metrics.

Red: The value is between 0-50%.

Yellow: The value is between 50%-80%.

Green: The value is between 80-100%.

The value returned by compliance.docker-bench.pass_pct.

Time aggregation is Average.

Group aggregation is Average.