This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Advisor

Advisor brings your metrics, alerts, and events into a focused and curated view to help you operate and troubleshoot Kubernetes infrastructure.

Advisor is available to only our SaaS users. The feature is not currently available for on-prem environments.

Advisor presents your infrastructure grouped by cluster, namespace, workload, and pod. You cannot currently configure a custom grouping. Depending on the selection, you will see different curated views and you can switch between the following:

  • Advisories
  • Triggered alerts
  • Events from Kubernetes, container engines, and custom user events
  • Cluster usage and capacity
  • Key golden signals (requests, latency, errors) derived from system calls
  • Kubernetes metrics about the health and status of Kubernetes objects
  • Container live logs
  • Process and network telemetry (CPU, memory, network connections, etc.)
  • Monitoring Integrations

The time window of metrics displayed on Advisor is the last 1 hour of collected data. To see historical values for a metric, drill down to a related dashboard or explore a metric using the Explore UI.

Advisories

Advisories evaluate the thousands of data points being sent by the Sysdig agent, and display a prioritized view of key problems in your infrastructure that affect the health and availability of your clusters and the workloads running on them.

When you select an advisory, relevant information related to the issue is surfaced, such as metrics, events, live logs, and remediation guidance. This enables you to pinpoint and resolve problems faster. Following SRE best practices, they are not necessarily symptoms of a problem, but instead causes that may not necessarily want to be alerted upon.

Example Issues Detected

Problem

Description

CrashLoopBackOff

A CrashLoopBackOff means that you have a pod starting, crashing, starting again, and then crashing again. This could cause applications to be degraded or unavailable.

Container Error

Persistent application error resulting in containers being terminated. An application error, or exit code 1, means the container was terminated due to an application problem.

CPU Throttling

Containers are hitting their CPU limit and being throttled. CPU throttling will not result in the container being killed, but will be starved of CPU resulting in application slow down.

OOM Kill

When a container reaches its memory limit it is terminated with an OOMKilled status, or exit code 137. This can lead to application instability or unavailability.

Image Pull Error

A container is failing to start as it cannot pull the image.

Advisories are automatically resolved when the problem is no longer detected. You cannot customize the Advisories evaluated. These are fully managed by Sysdig.

Live Logs

Advisor can display live logs for a container, which is the equivalent of running kubectl logs. This is useful for troubleshooting application errors or problems such as pods in a CrashLoopBackOff state.

When selecting a Pod, a Logs tab will appear. If there are multiple containers within a pod, you can select the container you wish to view logs for. Once requested, logs are streamed for 3 minutes before the session is automatically closed (you can simply re-start streaming if necessary).

Live logs are tailed on-demand and thus not persisted. After a session is closed they are no longer accessible.

Manage User Access to Live Logs

By default live logs is available to users within the scope of their Sysdig Team. Use Custom Roles to manage live logs permissions.

Configure Agent for Live Logs

Live logs are enabled by default in agent 12.7.0 or newer versions. Older versions of the Sysdig agent do not support live logs.

Live logs can be enabled or disabled within the agent configuration.

To turn live logs off globally for a cluster, add the following in the dragent.yaml file:

live_logs:
  enabled: false

If using Helm, this is configured via sysdig.settings. For example:

sysdig:
 # Advanced settings. Any option in here will be directly translated into dragent.yaml in the Configmap
 settings:
   live_logs:
     enabled: false

Troubleshoot Live Logs

If there is a problem with live logs, the following errors will be returned. Contact Sysdig Support for additional help and troubleshooting.

Error CodeCause
401kubelet doesn’t have the bearer token authorization enabled.
403The sysdig-agent ClusterRole doesn’t have the node/proxy permission.

YAML Configuration

Advisor can display the YAML configuration for pods, which is the equivalent of running kubectl get pod <pod> -o yaml. This is useful to see the applied configuration of a pod in a raw format, as well as metadata and status. To view the YAML, select a pod in Advisor and open the YAML tab.

Support for viewing YAML config is for pods only. Other object types are not yet supported.

Manage Access to YAML Configuration

By default, displaying YAML configuration is available to users within the scope of their Sysdig Team. Use Custom Roles to manage permissions. The permission for displaying YAML configuration is Advisor - Kubernetes API.

Configure Agent for YAML Configuration

YAML configuration can be enabled in agent 12.9.0 or newer versions. Older versions of the Sysdig agent do not support YAML configuration.

You can use the agent configuration to enable the YAML configuration.

To turn support for YAML configuration on globally for a cluster, add the following in the dragent.yaml file:

k8s_command:
  enabled: true

If you are using helm, edit sysdig.settings. For example:

sysdig:
 # Advanced settings. Any option in here will be directly translated into dragent.yaml in the Configmap
 settings:
   k8s_command:
     enabled: true

1 - Cost Advisor (Preview)

Cost Advisor provides predictable cost analysis and savings estimates for Kubernetes environments.

Cost Advisor is available to only our SaaS users. The feature is not currently available for on-prem environments.

Use Cases

Cost Advisor helps you get insights into the following use cases:

  • What is the cost of running compute (eg. EC2 instances) within a Kubernetes cluster?
  • What is the cost of running of compute required for an application / workload / namespace?
  • How can I reduce the cost of running workloads by rightsizing?

Supported Environments

Currently only AWS is supported. We are actively working on adding support for GCP and Azure.

  • The Sysdig Agent is required for Cost Advisor. The agent collects resource usage information that is augmented with billing data. There is no explicit configuration required for Cost Advisor.
  • Kubernetes clusters must be running in AWS, GCP, or Azure. Both managed clusters (eg. EKS) and vanilla Kubernetes (eg. KOPS) are supported.

Concepts

Cost Allocation

Cost Allocation is applicable to workloads and their associated namespaces, and displays the current allocated costs depending on resource requirements. Note that it is different from infrastructure costs, as workload cost allocation is calculated independently and can be considered a “logical cost”.

As workloads can exceed their configured requests (ie. it’s overcommitted using more than the number of requests, but less than resource limits) Cost Allocation is currently calculated daily by evaluating requests and usage, and taking whichever is greater for the given time period.

Cost Allocation considers compute (memory and CPU). In future we will factor in other costs including storage, network / load balancer costs, and other associated infrastructure costs.

Example cost allocation for a workload that has requests set to 5 CPU cores and 16GB memory running on an t3.medium with a CPU cost of $0.02/hour and memory cost of $0.003/hour (on-demand pricing).

Day      CalculationCost                    
Day 1Requested CPU: 5 CPUs ($0.10/hr)
Actual CPU Usage: 2 CPUs ($0.04/hr)
Requested Memory: 16GB ($0.048/hr)
Actual Memory Usage: 6GB ($0.018/hr)

Requests are greater than usage; therefore, actual usage is ignored. We consider requests for calculating the cost.
CPU cost: $2.40
Memory cost: $1.15
Daily Cost: $3.55
Day 2Requested CPU: 5 CPUs ($0.10/hr)
Actual CPU Usage: 12 CPUs ($0.24/hr)
Requested Memory: 16GB ($0.048/hr)
Actual Memory Usage: 6GB ($0.018/hr)

Memory requests are greater than usage; however actual CPU usage is higher than requests.
In this case, we consider actual CPU usage and memory requests.
CPU cost: $5.76
Memory cost: $1.15
Daily Cost: $6.91
Day 3Requested CPU: 5 CPUs ($0.10/hr)
Actual CPU Usage: 12 CPUs ($0.024/hr)
Requested Memory: 16GB ($0.048/hr)
Actual Memory Usage: 25GB ($0.075/hr)

Both the actual memory and CPU usage are higher than requests (ie. overcommitted).
Here, we consider actual CPU and memory usage.
CPU cost: $5.76
Memory cost: $1.80
Daily Cost: $7.56

Efficiency Metrics

Resource Efficiency

Resource Efficiency is a calculation of both CPU and memory requests against usage, producing a single score. This indicates how well a workload is using its requested resources. The resource efficiency posture is put into the following brackets:

Value                    Explanation
0 (no data)No CPU or memory requests configured will show zero.
0-20A low value indicates a workload is oversized and may be a good candidate for rightsizing.
20-70Workload resource efficiency could be improved.
70-120Good resource efficiency - improvements could be made, but this is a good score.
120 or higherHigh values (over 120) indicates that the workload can suffer resource starvation or pod eviction as it is consuming a lot more resources than requested.
CPU Requests

Average usage of CPU against requests over the last 10 minutes. No requests configured will show zero. Example:

CPU Requests = sum workload CPU usage over the last 10 minutes / sum workload CPU requests

Memory Requests

Average usage of memory against requests over the last 10 minutes. No requests configured will show zero. Example:

Memory Requests = sum workload memory usage over the last 10 minutes / sum workload memory requests

Note that for CPU requests, memory requests, and resource efficiency, the calculation is made a the individual workload level. This means when looking at a namespace, these values are an aggregate of all workloads within the same space.

Cost Savings

Cost Advisor helps teams optimize costs by recommending changes to their infrastructure.

Workload Rightsizing

Cost Advisor will surface savings to help you prioritize rightsizing workloads with the highest saving potential.

For all workloads running on your clusters, Cost Advisor evaluates the usage against requests. For oversized workloads (usage is less than requests) you can use Cost Advisor to 1) quantify cost saving if you were to rightsize requests, and 2) see a recommendation on what values to rightsize workloads to.

Cost Advisor helps to baseline workload costs by recommending CPU and memory requests. The recommendation is calculated by looking at the P95 usage of all unique containers running within a workload over the past 1 day. The recommendation is provided per container (in the case of pods running multiple containers).

Currently the recommendation to achieve savings is based on P95 usage over the past 1 day. Support for customizing the methodology that produces this recommendation is coming soon.

2 - Overview

Overview leverages Sysdig’s unified data platform to monitor, secure, and troubleshoot your hosts and Kubernetes clusters and workloads.

The module provides a unified view of the health, risk, and capacity of your Kubernetes infrastructure— a single pane of glass for host machines as well as Kubernetes Clusters, Nodes, Namespaces, and Workloads across a multi- and hybrid-cloud environment. You can easily filter by any of these entities and view associated events and health data.

Overview shows metrics prioritized by event count and severity, allowing you to get to the root cause of the problem faster. Sysdig Monitor polls the infrastructure data every 10 minutes and refreshes the metrics and events on the Overview page with the system health.

Key Benefits

Overview provides the following benefits:

  • Show a unified view of the health, risk, resource use, and capacity of your infrastructure environment at scale

    • Render metrics, security events, compliance CIS benchmark results, and contextual events in a single location

    • Eliminate the need for stand-alone security, monitoring, and forensics tools

    • View data on-the-fly by workload or by infrastructure

  • Display contextual live event stream from alerts, Kubernetes, containers, policies, and image scanning results

  • Surface entities intelligently based on event count and severity

  • Drills down from Clusters to Nodes and Namespaces

  • Support Infrastructure monitoring of multi- and hybrid- cloud environments

  • Expose relevant information based on core operational users :

    • DevOps / Platform Ops

    • Security Analyst

    • Service Owner

Accessing the Overview User Interface

You can access and set the scope of Overview in the Sysdig Monitor UI or with the URL:

  • On-Prem: https://[Sysdig URL]/#/overview

  • SAAS: See SaaS Regions and IP Ranges and identify the correct domain URL associated with your Sysdig application and region. For example, for US East is: https://app.sysdigcloud.com/#/overview

    For other regions, the format is https://<region>.app.sysdig.com/\#/overview . Replace <region> with the region where your Sysdig application is hosted. For example, for Sysdig Monitor in the EU, you use https://eu1.app.sysdig.com/#/overview.

Click Overview in the left navigation, then select one of the Kubernetes entities:

About the Overview User Interface

The Overview interface opens to the Clusters Overview page. This section describes the major components of the interface and the navigation options.

Though the default landing page is Clusters Overview, when you have no Kubernetes clusters configured, the Overview tab opens to the Hosts view. In addition, when you reopen the Overview menu, the default view will be your last visited Overview page as it retains the visit history.

Overview Rows

Each row represents a Kubernetes entity: a cluster, node, namespace, or workload. In the screenshot above, each row shows a Kubernetes cluster.

  • Navigating rows is easy

    Click on the Overview icon in the left navigation and choose an Overview page, or drill down into the next Overview page to explore the next granular level of data. Each Overview page shows 10 rows by default and a maximum of 100 rows. Click Load More to display additional rows if there are more than 10 rows per page.

  • Ability to select a specific row in an Overview page

    Each row contains the scope of the relevant entity that it is showing data for. Clicking a specific row leads to deselecting the rest of the rows (for instance, selecting staging deselects all other rows in the screenshot above) to focus on the scope of the selected entity, including the events which are scoped out by that row. Pausing to focus on a single row provides a snapshot of what is going on until at the moment with the entity under purview.

  • Entities are ranked according to the severity and the number of events detected in them

    Rows are sorted by the count and severity level of the events associated with the entity and are displayed in descending order. The items with the highest number of high severity events are shown first, followed by medium, low, and info. This organization helps to highlight events demanding immediate attention and to streamline troubleshooting efforts, in environments that may include thousands of entities.

Scope Editor

Scope Editor allows targeting down to a specific entity, such as a particular workload or namespace, from environments that may include thousands of entities. The levels of scope, determined by Kubernetes hierarchy, progresses from Workload to Cluster where Cluster being at the top level. In smaller environments, using the Scope Editor is equivalent to clicking a single row in an Overview page where no scope has been applied.

Cluster: The highest level in the hierarchy. The only scope applied to the page is Cluster. It allows you to select a specific cluster from a list of available ones.

Node: The second level in the hierarchy. The scope is determined by Cluster and Node. Selection is narrowed down to a specific node in a selected cluster.

Namespace: The third level in the hierarchy. The scope is determined by Cluster and Namespace. Selection is narrowed down to a specific namespace in a selected cluster.

Workloads: The last entity in the hierarchy. The scope is initially determined by Cluster and Namespace, then the selection is narrowed to a specific Deployment, DaemonSet, or StatefulSet. Choosing all three options are not allowed.

Time Navigation

The Overview feature is based around time. Sysdig Monitor polls the infrastructure data every 10 second and refreshes the metrics and events on the Overview page with the system health. The time range is fixed at 12 hours. However, the gauge and compliance score widgets display the latest data sample, not an aggregation over the entire 12-hour time range.

The Overview feed is always live and cannot be paused.

Unified Stream of Events

The right panel of Overview provides a context-sensitive events feed.

Click an overview row to see relevant Events on the right. Each event is intelligently populated with end-to-end metadata to give context and enable troubleshooting.

Event Types

Overview renders the following event types:

  • Alert: See Alerts.

  • Custom: Ensure that Custom labels are enabled to view this type of events.

  • Containers: Events associated with containers.

  • Kubernetes: Events associated with Kubernetes infrastructure.

  • Scanning: See Image Scanning.

  • Policy: See Policies.

Event Statuses

Overview renders the following alert-generated event statuses:

  • Triggered: The alert condition has been met and still persists.

  • Resolved: A previously existed alert condition no longer persists.

  • Acknowledged: The event has been acknowledged by the intended recipient.

  • Un-acknowledged: The event has not been acknowledged by an intended recipient. All events are by default marked as Un-acknowledged.

  • Silenced: The alert event has been silenced for a specified scope. No alert notification will be sent out to the channels during the silenced window.

General Guidelines

First-Time Usage

  • If the environment is created for the first time, Sysdig Monitor fetches data and generates associated pages. The Overview feature is immediately enabled. However, wait for, at the maximum, 1 hour to see the Overview pages with the necessary data.

  • Overview uses time windows in segments of 1H, 6H and 1D, and therefore wait respectively for 1H, 6H and 1D to be able to see data on the Overview pages.

  • If enough data is not available for the first 1 hour, the “No Data Available” page will be presented until the first 1 hour passes.

Tuning Overview Data

Sysdig Monitor leverages a caching mechanism to fetch pre-computed data for the Overview screens.

If pre-computed data is unavailable, data fetched will be non-computed data, which must be calculated before displaying. This additional computational time adds delays. Caching is enabled for Overview but for optimum performance, you must wait for 1H, 6H, and 1D windows the first time you use Overview. After the specified time has passed, the data will be automatically be cached with every passing minute.

Enabling Overview for On-Prem Deployments

The Overview feature is not available by default on On-Prem deployments. Use the following API to enable it:

  1. Get the Beta settings as follows:

    curl -X GET 'https://<Sysdig URL>/api/on-prem/settings/overviews' \
    -H 'Authorization: Bearer <GLOBAL_SUPER_ADMIN_SDC_TOKEN>' \
    -H 'X-Sysdig-Product: SDC' -k
    

    Replace <Sysdig URL> with the Sysdig URL associated with your deployment and <GLOBAL_SUPER_ADMIN_SDC_TOKEN> with the SDC token associated with your deployment.

  2. Copy the payload and change the desired values in the settings.

  3. Update the settings as follows:

    curl X PUT 'https://<Sysdig URL>/api/on-prem/settings/overview' \
    -H 'Authorization: Bearer <GLOBAL_SUPER_ADMIN_SDC_TOKEN>' \
    -H 'X-Sysdig-Product: SDC' \
    -d '{  "overviews": true,  "eventScopeExpansion": true}'
    

Feature Flags

  • overviews: Set overviews to true to enable the backend components and the UI.

  • eventScopeExpansion: Set eventScopeExpansion to true to enable scope expansion for all the Event types.

2.1 - Clusters Data

This topic discusses the Clusters Overview page and helps you understand its gauge charts and the data displayed on them.

About Clusters Overview

In Kubernetes, a pool of nodes combine together their resources to form a more powerful machine, that is a Cluster. The Cluster Overview page provides key metrics indicating the health, risk, capacity, and compliance of each cluster. Your cluster can reside in any cloud or multi-cloud environment of your choice.

Each row in the Clusters page represents a cluster. Clusters are sorted by the severity of corresponding events in order to highlight the area that needs attention. For example, a cluster with high severity events is bubbled up to the top of the page to highlight the issue. You can further drill down to the Nodes or Namespaces Overview page for investigating at each level.

In environments where no Sysdig Secure is enabled, Network I/O is shown instead of the Compliance score.

Interpret the Cluster Data

This topic gives insight into the metrics displayed on the Clusters Overview screen.

Node Ready Status

The chart shows the latest value returned by avg(min(kubernetes.node.ready)).

What Is It?

The number shows the readiness for nodes to accept pods across the entire cluster. The numeric availability indicates the percentage of time the nodes are reported as ready by Kubernetes. For example:

  • 100% is displayed when 10 out of 10 nodes are ready for the entire time window, say, for the last one hour.

  • 95% is displayed when 9 out of 10 nodes are ready for the entire time window and one node is ready only for 50% of the time.

The bar chart displays the trend across the selected time window, and each bar represents a time slice. For example, selecting the last 1-hour window displays 6 bars, each indicating a 10-minute time slice. Each bar represents the availability across the time slice (green) or the unavailability (red).

For instance, the following image shows an average availability of 80% across the last 1-hour, and each 10-minute time slice shows a constant availability for the same time window:

What to Expect?

Expect a constant 100% at all times.

What to Do Otherwise?

If the value is less than 100%, determine whether a node is not available at all, or one or more nodes are partially available.

  • Drill down either to the Nodes screen in Overview or to the “Kubernetes Cluster Overview” in Explore to see the list of nodes and their availability.

  • Check the Kubernetes Node Overview dashboard in Explore to identify the problem that Kubernetes reports.

Pods Available vs Desired

The chart shows the latest value returned by sum(avg(kubernetes.namespace.pod.available.count)) / sum(avg(kubernetes.namespace.pod.desired.count)).

What Is It?

The chart displays the ratio between available and desired pods, averaged across the selected time window, for all the pods in a given Cluster. The upper bound shows the number of desired pods in the Cluster.

For instance, the following image shows 42 desired pods are available to use:

What to Expect?

You should typically expect 100%.

If certain pods take a long time to be available you might temporarily see a value that is less than 100%. Pulling images, pod initialization, readiness probe, and so on causes such delays.

What to Do Otherwise?

Identify one or more Namespaces that have lower availability. To do so, drill down to the Namespaces screen, then drill down to the Workloads screen to identify the unavailable pods.

If the number of unavailable pods is considerably higher (the ratio is significantly low), check the status of the Nodes. A Node failure will cause several pods to become unavailable across most of the Namespaces.

Several factors could cause the pods to stuck in the Pending state:

  • Pods make requests for resources that exceed what’s available across the nodes (the remaining allocatable pods).

  • Pods make requests higher than the availability of every single node. For example, you have 8-core Nodes and you create a pod with a 16-core request. These pods might require reconfiguration and specific setup related to Node affinity and anti-affinity constraints.

  • Namespace quota is reached before making a high resource request.

    If a quota is enforced at the Namespace level, you may hit the limit independent of the resource availability across the Nodes.

CPU Requests vs Allocatable

The chart shows the latest value returned by sum(avg(kubernetes.pod.resourceRequests.cpuCores)) / sum(avg(kubernetes.node.allocatable.cpuCores)).

What Is It?

The chart displays the ratio between CPU requests configured for all the pods in a selected Cluster and allocatable CPUs across all the nodes.

The upper bound shows the number of allocatable CPU cores across all the nodes in the Cluster.

For instance, the image below shows that out of 620 available CPU cores across all the nodes (allocatable CPUs), 71% is requested by the pods:

What to Expect?

Your resource utilization strategy determines what ratio you can expect. A healthy ratio falls between 50% and 80%.

Assuming all the nodes have the same amount of allocatable resources, a reasonable upper bound is the value of (node_count - 1) / node_count x 100. For example, the ratio will be 90% if you have 9 nodes. Having this percentage protects you against a node becoming unavailable.

What to Do Otherwise?

A lower ratio indicates under-utilized resources (and corresponding cost) in your infrastructure. A higher ratio indicates insufficient resources. As a result

  • Applications cannot be scheduled to be run.

  • Pods might not start and remain in a Pending/Unscheduled state.

To triage, do the following:

  • Drill down to the Nodes screen to get insights into how resources are utilized across all nodes.

  • Drill down to the Namespaces screen to understand how resources are requested across Namespaces.

  • Drill down to Explore and refer to the following dashboards:

    • Kubernetes CPU Allocation Optimization: Evaluate whether a significant amount of resources are under-utilized in the infrastructure.

    • Kubernetes Workloads CPU Usage and Allocation: Determine whether pods are properly configured and are using resources as expected.

Can the Value Be Higher than 100%?

Currently, the ratio accounts only for scheduled pods, while pending pods are excluded from the calculation. This means pods have been scheduled to run on Nodes out of the allocatable pods. Consequently, the ratio cannot be higher than 100%.

In the case of over-commitment (pods requesting for more resources than what’s available), you can expect a higher Requests vs Allocatable ratio and a lower Pods Available vs Desired ratio. What it indicates is that most of the available resources are being used, and what’s left is not enough to schedule additional pods. Therefore, the Available vs Desired ratio for pods will decrease.

When your environment has pods that are updated often or that are deleted and created often (for example, testing Clusters), the total requests might appear higher than what it is at any given time. Consequently, the ratio becomes higher across the selected time window, and you might see a value that is higher than 100%. This error is rendered due to how the data engine calculates the aggregated ratio.

Drill down to Kubernetes Cluster Overview to see the CPU Cores Usage vs Requests vs Allocatable time series to correctly evaluate the trend of the request commitments.

Listed below are some of the factors that could cause the pods to stuck in a Pending state:

  • Pods make requests that exceed what’s available across the nodes (the remaining allocatable pods). The Requests vs Allocatable ratio is an indicator of this issue.

  • Pods make requests that are higher than the availability of every single Node. For example, you have 8-core Nodes and you create a pod with a 16-core request. These pods might require reconfiguration and specific setup related to Node affinity and anti-affinity constraints.

  • The Quota set at the Namespace level is reached before a request is configured. The Requests vs Allocatable ratio may not suggest the problem, but the Pods Available vs Desired ratio would decrease, especially for the specific Namespaces. See the Namespaces screen in Overview.

Memory Requests vs Allocatable

The chart shows the latest value returned by sum(avg(kubernetes.pod.resourceRequests.memBytes)) / sum(avg(kubernetes.node.allocatable.memBytes)).

What Is It?

The chart displays the ratio between memory requests configured for all the pods in the Cluster and allocatable memory available across all the Nodes.

The upper bound shows the allocatable memory available across all Nodes. The value is expressed in bytes, displayed in a specified unit.

For instance, the image below shows that out of 29.7 GiB available across all Nodes (allocatable memory), 35% is requested by the pods:

What to Expect?

Your resource utilization strategy determines what ratio you can expect. A healthy ratio falls between 50% and 80%.

Assuming all the nodes have the same amount of allocatable resources, a reasonable upper bound is the value of (node_count - 1) / node_count x 100. For example, 90% if you have 9 nodes. This ratio protects your system against a node becoming unavailable.

What to do Otherwise

A lower ratio indicates under-utilized resources (and corresponding cost) in your infrastructure. A higher ratio indicates insufficient resources. As a result

  • Applications cannot be scheduled to be run.

  • Pods might not start and remain in a Pending/Unscheduled state.

To troubleshoot, do the following:

  • Drill down to the Nodes screen to get insights into how resources are utilized across all the Nodes.

  • Drill down to the Namespaces screen to understand how resources are requested across Namespaces.

  • Drill down to Explore and refer to the following dashboards:

    • Kubernetes Memory Allocation Optimization: Evaluate whether a significant amount of resources are under-utilized in the infrastructure.

    • Kubernetes Workloads Memory Usage and Allocation: Determine whether pods are properly configured and are using resources as expected.

Can the Value be Higher than 100%?

The ratio currently accounts only for scheduled pods, while pending pods are excluded from the calculation. What this implies is that pods have been scheduled to run on Nodes out of the allocatable resources available. Consequently, the ratio cannot be higher than 100%.

In the case of over-commitment (pods requesting for more resources than what’s available), expect a higher Requests vs Allocatable ratio and a lower Pods Available vs Desired ratio. What it indicates is that most of the available resources have been used and what’s left is not enough to schedule additional pods. Therefore, the Pods Available vs Desired ratio will decrease.

When your environment has pods that are updated often or that are deleted and created often (for example, testing Clusters), the total requests might appear higher than what it is at any given time. Consequently, the ratio becomes higher across the selected time window, and you might see a value that is higher than 100%. This error is rendered due to how the data engine calculates the aggregated ratio.

Drill down to Kubernetes Cluster Overview to see the Memory Requests vs Allocatable time series to correctly evaluate the trend for the request commitments.

Listed are some of the factors that could cause your pods to stuck in a Pending state:

  • Pods make requests that exceed what’s available across the nodes (the remaining allocatable pods). The Requests vs Allocatable ratio is an indicator of this issue.

  • Pods make requests that are higher than the availability of every single Node. For example, you have 8-core nodes and you create a pod with a 16-core request. These pods might require configuration changes and specific setup related to node affinity and anti-affinity factors.

  • The Quota set at the Namespace-level is reached before a high request is configured. The Requests vs Allocatable ratio might not suggest the problem, but the Pods Available vs Desired ratio would decrease, especially for the specific Namespaces. See the Namespaces screen in Overview.

Compliance Score

Docker: The latest value returned by avg(avg(compliance.k8s-bench.pass_pct)).

Kubernetes: The latest value returned by avg(avg(compliance.docker-bench.pass_pct)).

What Is it?

The numbers show the percentage of benchmarks that succeeded in the selected time window, respectively for Docker and Kubernetes entities.

What to Expect

If you do not have Sysdig Secure enabled, or you do not have benchmarks scheduled, then you should expect no data available.

Otherwise, the higher the score, the more compliant your infrastructure is.

What to Do Otherwise?

If the score is lower than expected, drill down to Docker Compliance Report or Kubernetes Compliance Report to see further details about benchmark checks and their results.

You may also want to use the Benchmarks / Results page in Sysdig Secure to see the history of checks.

2.2 - Nodes Data

This topic discusses the Nodes Overview page and helps you understand its gauge charts and the data displayed on them.

About Nodes Overview

A node refers to a worker machine in Kubernetes. A physical machine or VM can represent a node. The Nodes Overview page provides key metrics indicating the health, capacity, and compliance of each node in your cluster.

In environments where no Sysdig Secure is enabled, Network I/O is shown instead of the Compliance score.

Interpret the Nodes Data

This topic gives insight into the metrics displayed on the Nodes Overview page.

Node Ready Status

The chart shows the latest value returned by avg(min(kubernetes.node.ready)).

What Is It?

The number expresses the Node readiness to accept pods across the Cluster. The numeric availability indicates the percentage of time the Node is reported ready by Kubernetes. For example:

  • 100% is displayed when a Node is ready for the entire time window, say, for the last one hour.

  • 95% when the Node is ready for 95% of the time window, say, 57 out of 60 minutes.

The bar chart displays the trend across the selected time window, and each bar represents a time slice. For example, selecting “last 1 hour” displays 6 bars, each indicating a 10-minute time slice. Each bar shows the availability across the time slice (green) and the unavailability (red).

For instance, the image below indicates the Node has not been ready for the entire last 1-hour time window:

What to Expect?

The chart should show a constant 100% at all times.

What to Do Otherwise?

If the number is less than 100%, review the status reported by Kubernetes. Drill-down to the Kubernetes Node Overview Dashboard in Explore to see details about the Node readiness:

If the Node Ready Status has an alternating behavior, as shown in the image, the node is flapping. Flapping indicates that the kubelet is not healthy. See specific conditions reported by Kubernetes that would help determine the causes for the Node not being ready. Such conditions include network issues and memory pressure.

Pods Ready vs Allocatable

The chart reports the latest value of sum(avg(kubernetes.pod.status.ready)) / avg(avg(kubernetes.node.allocatable.pods)).

What Is It?

It is the ratio between available and allocatable pods configured on the node, averaged across the selected time window.

The Clusters page includes a similar chart named Pods Available vs Desired. However, the meaning is different:

  • The Pods Available vs Desired chart for Clusters highlights how many pods you expect and how many are actually available. See IsPodAvailable for a detailed definition.

  • The Pods Ready vs Allocatable chart for Nodes indicates how many pods can be scheduled on each Node and how many are actually ready.

The upper bound shows the number of pods you can allocate in the node. See node configuration.

For instance, the image below indicates that you can allocate 110 pods in the Node (default configuration), but only 11 pods are ready:

What to Expect?

The ratio does not relate to resource utilization, but it measures the pod density on each node. The more pods you have on a single node, the more effort the kubelet has to put in order to manage the pods, the routing mechanism, and Kubernetes overall.

Given the allocatable is properly set, values lower than 80% indicate a healthy status.

What to Do Otherwise?

  • Reviewing the default maximum pods configuration of the kubelet to allow more pods, especially if the CPU and memory utilization is healthy.

  • Adding more nodes to allow for more pods to be scheduled.

  • Reviewing kubelet process performance and Node resource utilization in general. A higher ratio indicates high pressure on the operating system and for Kubernetes itself.

CPU Requests vs Allocatable

The chart shows the latest value returned by sum(avg(kubernetes.pod.resourceRequests.cpuCores)) / sum(avg(kubernetes.node.allocatable.cpuCores)).

What Is It?

The chart shows the ratio between the number of CPU cores requested by the pods scheduled on the Node and the number of cores available to pods. The upper bound shows the CPU cores available to pods, which corresponds to the user-defined configuration for allocatable CPU.

For instance, the image below shows that the Node has 16 CPU cores available, out of which, 84% are requested by the pods scheduled on the Node:

What to Expect?

Expect a value up to 80%.

Assuming all the nodes have the same amount of allocatable resources, a reasonable upper bound is the value of (node_count - 1) / node_count x 100. For example, 90% if you have 9 nodes. Having a high ratio protects your system against a Node becoming unavailable.

What to Do Otherwise?

  • A low ratio indicates the Node is underutilized. Drill up to the corresponding cluster in the Clusters page to determine whether the number of pods currently running is lower, or if the pods cannot run for other reasons.

  • A high ratio indicates a potential risk of being unable to schedule additional pods on the Node.

    Drill down to the  Kubernetes Node Overview Dashboard to evaluate what Namespaces, Workloads, and pods are running. Additionally, drill up in the Clusters page to evaluate whether you are over-committing the CPU resource. You might not have enough resources to fulfill requests, and consequently, pods might not be able to run on the Node. Consider adding Nodes or replacing Nodes with additional CPU cores.

Can the Value Be Higher than 100%?

Kubernetes schedules pods on Nodes where sufficient allocatable resources are available to fulfill the pod request. This means Kubernetes does not allow having a total request higher than the allocatable. Consequently, the ratio cannot be higher than 100%.

Over-committing (pods requesting resources higher than the capacity) results in a high Requests vs Allocatable ratio and a low Pods Available vs Desired ratio at the Cluster level. What it indicates is that most of the available resources are being used, consequently, what’s available is not sufficient to schedule additional pods. Therefore, Pods Available vs Desired ratio will also decrease.

Memory Requests vs Allocatable

The chart highlights the latest value returned by sum(avg(kubernetes.pod.resourceRequests.memBytes)) / sum(avg(kubernetes.node.allocatable.memBytes)).

What Is It?

The ratio between the number of bytes of memory is requested by the pods scheduled on the node and the number of bytes of memory available.The upper bound shows the memory available to pods, which corresponds to the user-defined allocatable memory configuration.

For instance, the image below indicates the node has 62.8 GiB of memory available, out of which, 37% is requested by the pods scheduled on the Node:

What to Expect?

A healthy ratio falls under 80%.

Assuming all the nodes have the same amount of allocatable resources, a reasonable upper bound is the value of (node_count - 1) / node_count x 100. For example, the ratio is 90% if you have 9 nodes. Having a high ratio protects your system against a node becoming unavailable.

What to Do Otherwise?

  • A low ratio indicates that the Node is underutilized. Drill up to the corresponding cluster in the Clusters page to determine whether the number of pods running is low, or if pods cannot run for other reasons.

  • A high ratio indicates a potential risk of being unable to schedule additional pods on the node.

    • Drill down to the  Kubernetes Node Overview dashboard to evaluate what Namespaces, Workloads, and pods are running.

    • Additionally, drill up in the Clusters page to evaluate whether you are over-committing the memory resource. Consequently, you don’t have enough resources to fulfill requests, and pods might not be able to run. Consider adding nodes or replacing nodes with more memory.

Can the Value be Higher than 100%?

Kubernetes schedules pods on nodes where sufficient allocatable resources are available to fulfill the pod request. This means Kubernetes does not allow having a total request higher than the allocatable. Consequently, the ratio cannot be higher than 100%.

Over-committing (pods requesting for more resources than that are available) results in a high Requests vs Allocatable ratio at the Nodes level and a low Pods Available vs Desired ratio at the Cluster level. What it indicates is that most of the resources are being used, consequently, what’s available is not sufficient to schedule additional pods. Therefore, Pods Available vs Desired ratio will also decrease.

Network I/O

The chart shows the latest value returned by avg(avg(net.bytes.total)).

What Is It?

The sparkline shows the trend of network traffic (inbound and outbound) for a Node. The number indicates the most recent rate of restarts per second.

For reference, the sparklines show the following number of steps (sampling):

  • Last hour: 6 steps, each for a 10-minute time slice

  • Last 6 hours: 12 steps, each for a 20-minute time slice

  • Last day: 12 steps, each for a 2-hour time slice

What to Expect?

The metric highly depends on what type of applications run on the Node. You should expect some network activity for Kubernetes related operations.

Drilling down to the Kubernetes Node Overview Dashboard in Explore will provide additional details, such as network activity across pods.

2.3 - Namespaces Data

This topic discusses the Namespaces Overview page and helps you understand its gauge charts and the data displayed on them.

About Namespaces Overview

Namespaces are virtual clusters on a physical cluster. They provide logical separation between the teams and their environments. The Namespaces Overview page provides key metrics indicating the health, capacity, and performance of each Namespace in your cluster.

Interpret the Namespaces Data

This topic gives insight into the metrics displayed on the Namespaces Overview screen.

Pod Restarts

The chart highlights the latest value returned by avg(timeAvg(kubernetes.pod.restart.rate)).

What Is It?

The sparkline shows the trend of pod restarts rate across all the pods in a selected Namespace. The number shows the most recent rate of restarts per second.

For instance, the image shows a rate of 0.04 restarts per second for the last 2-hours, given the selected time window is one day. The trend also suggests a non-flat pattern (periodic crashes).

  • Last hour: 6 steps, each for a 10-minute time slice

  • Last 6 hours: 12 steps, each for a 20-minute time slice

  • Last day: 12 steps, each for a 2-hour time slice

What to Expect?

Expect 0 restarts for any pod.

What to Do Otherwise?

A few restarts across the last one hour or larger time windows might not indicate a serious problem. In the event restart loop, identify the root cause as follows:

  • Drill down to the Workloads page in Overview to identify the Workloads that have been stuck at a restart loop.

  • Drill down to the Kubernetes Namespace Overview to see a detailed trend broken down by pods:

Pods Available vs Desired

The chart shows the latest value returned by sum(avg(kubernetes.namespace.pod.available.count)) / sum(avg(kubernetes.namespace.pod.desired.count)).

What Is It?

The chart displays the ratio between available and desired pods, averaged across the selected time window, in a given Namespace.

The upper bound shows the number of desired pods in the namespace.

For instance, the image below shows 42 desired pods that are available:

What to Expect?

Expect 100% on the chart.

If certain pods take a significant amount of time to become available due to delays (image pull time, pod initialization, readiness probe) you might temporarily see a ratio lower than 100%.

What to Do Otherwise?

  • Identify one or more Workloads that have low availability by drilling down to the Workloads page.

  • Once you identify the Workload, drill down to the related dashboard in Explore. For example, Kubernetes Deployment Overview to determine the trend and the state of the pods.

    For instance, in the following image, the ratio is 98% (3.93 / 4 x 100). The decline is due to an update that caused pods to be terminated and consequently to be started with a newer version.

CPU Used vs Requests

The chart shows the latest value returned by sum(avg(cpu.cores.used)) / sum(avg(kubernetes.pod.resourceRequests.cpuCores)).

What Is It?

The chart shows the ratio between the total CPU usage across all the pods in the Namespace and the total CPU requested by all the pods.

The upper bound shows the total CPU requested by all the pods. The value is expressed as the number of CPU cores.

For instance, the image below shows the pods in a Namespace requests for 40 CPU cores, of which only 43% is being used (about 17 cores):

What to Expect?

The value you see depends on the type of Workloads running in the Namespace.

Typically, values that fall between 80% and 120% is considered healthy. Values higher than 100% is considered healthy relatively for a short amount of time.

For applications whose resource usage is constant (such as background processes), expect the ratio to be close to 100%.

For “bursty” applications, such as an API server, expect the ratio to be less than 100%. Note that this value is averaged for the selected time window, therefore, a usage spike would be compensated by an idle period.

What to Do Otherwise?

A low usage indicates that the application is not properly running (not executing the expected functions) or the Workload configuration is not accurate (requests are too high compared to what the pods actually need).

A high usage indicates that the application is operating with a heavy load or the workload configuration is not accurate (requests are too low compared to what pods actually need).

In either case, drill down to the Workloads page to determine the workload that requires a deeper analysis.

Can the Value Be Higher than 100%?

Yes, it can.

  • You can configure requests without limits, or requests lower than the limits. In either case, you are allowing the containers to use more resources than requested, typically to handle temporary overloads.

  • Consider a Namespace with two Workloads with one pod each. Say, one Workload is configured to request for 1 CPU core and uses 1 CPU core (ratio of Used vs Request is 100%). The other Workload is configured without any request and uses 1 CPU core. In this example, 2 CPU cores used to 1 CPU core requested ratio at the Namespace level is 200%.

Memory Used vs Requests

The chart shows the latest value returned by sum(avg(memory.bytes.used)) / sum(avg(kubernetes.pod.resourceRequests.memBytes)).

What Is It?

The chart shows the ratio between the total memory usage across all pods of the Namespace and the total memory requested by all pods.

The upper bound shows the total memory requested by all the pods, expressed in a specified unit for bytes.

For instance, the image below shows that all the pods in the Namespace requests for 120 GiB, of which only 24% is being used (about 29 GiB):

What to Expect?

It depends on the type of Workloads you run in the Namespace. Typically, values that fall between 80% and 120% are considered healthy.

Values that are higher than 100% considered normal for a relatively short amount of time.

What to Do Otherwise?

A low usage indicates the application is not properly running (not executing the expected functions) or the workload configuration is not accurate (high requests compared to what the pods actually need).

A high usage indicates the application is operating with a high load or the Workload configuration is not accurate (Fewer requests compared to what the pods actually need).

Given the configured limits for the Workloads and the memory pressure on the nodes, if the Workloads use more memory than what’s requested they are at risk of eviction. See Exceed a Container’s Limit for more information.

In both cases, you may want to drill down to the Workloads page to determine which Workload requires a deeper analysis.

Can the Value Be Higher than 100%?

Yes, it can.

  • You can configure requests without limits, or requests lower than the limits. In either case, you are allowing the containers to use more resources than requested, typically to handle temporary overloads.

  • Consider a Namespace with two Workloads with one pod each. Say, one Workload is configured to request for 1 GiB of memory and uses 1 GiB (Used vs Request ratio is 100%). The other Workload is configured without any request and uses 1 GiB. In this example, 2 GiB of Memory Used to1 GiB Requested ratio at the Namespace level is 200%.

Network I/O

The chart shows the latest value returned by avg(avg(net.bytes.total)).

What Is It?

The sparkline shows the trend of network traffic (inbound and outbound) for all the pods in the Namespace. The number shows the most recent rate, expressed in restarts per second.

For reference, the sparklines show the following number of steps (sampling):

  • Last hour: 6 steps, each for a 10-minute time slice

  • Last 6 hours: 12 steps, each for a 30-minute time slice

  • Last day: 12 steps, each for a 2-hour time slice

What to Expect?

The type of applications run in the Namespace determine the metrics. Drilling down to the Kubernetes Namespace Overview Dashboard in Explore provides additional details, such as network activity across pods.

2.4 - Workloads Data

This topic discusses the Workloads Overview page and helps you understand its gauge charts and the data displayed on them.

About Workloads Overview

Workloads, in Kubernetes terminology, refers to your containerized applications. Workloads comprise of Deployments, Statefulsets, and Daemonsets within a Namespace.

In a Cluster, worker nodes run your application workloads, whereas the master node provides the core Kubernetes services and orchestration for application workloads. The Workloads Overview page provides the key metrics indicating health, capacity, and compliance.

Interpret the Workloads Data

This topic gives insight into the metrics displayed on the Workloads Overview page.

Pod Restarts

The chart displays the latest value returned by sum(timeAvg(kubernetes.pod.restart.rate)).

What Is It?

The sparkline shows the trend of Pod Restarts rate across all the pods in a selected Workload. The number shows the most recent rate, expressed in Restarts per Second.

For instance, the image below shows the trend for the last hour. The number indicates that the rate of pod restarts is less than 0.01 for the last 10 minutes.

For reference, the sparklines show the following number of steps (sampling):

  • Last hour: 6 steps, each for a 10-minute time slice.

  • Last 6 hours: 12 steps, each for a 20-minute time slice.

  • Last day: 12 steps, each for a 2-hour time slice.

What to Expect?

A healthy pod will have 0 restarts at any given time.

What to Do Otherwise?

In most cases, fewer restarts in the last hour (or larger time windows) do not indicate a serious problem. Drill down to the Kubernetes Overview Dashboard related to the Workload in Explore. For example, Kubernetes StatefulSet Overview provides a detailed trend broken down by pods.

In this example, the number of restarts is constant (roughly every 5 minutes) and no pods are ready. This might indicate a crash loop back-off .

Pods Available vs Desired

The chart shows the latest value of returned by sum(avg(kubernetes.deployment.replicas.available)) / sum(avg(kubernetes.deployment.replicas.desired)).

What Is It?

The chart displays the ratio between available and desired pods, averaged across the selected time window, for all the pods in a given Workload.

The upper bound shows the number of desired pods in the Workload.

For instance, the image below shows all the 42 desired pods are available.

What to Expect?

You should typically expect 100%.

If certain pods take a significant amount of time to become available (image pull time, pod initialization, readiness probe), then you may temporarily see a ratio lower than 100%.

What to Do Otherwise?

Determine the Workloads that have low availability by drilling down to the related Dashboard in Explore. For example, the Kubernetes Deployment Overview helps understand the trend and the state of the pods.

For instance, the image above shows that the ratio is 98% (3.93 / 4 x 100). The slight decline is due to an update that caused pods to be terminated and consequently to be started with a newer version.

CPU Used vs Requests

The chart shows the latest value returned by sum(avg(cpu.cores.used)) / sum(avg(kubernetes.pod.resourceRequests.cpuCores)).

What Is It?

The chart shows the ratio between the total CPU usage across all pods of a selected Workload and the total CPU requested by all the pods.

The upper bound shows the total CPU requested by all the pods. The value denotes the number of CPU cores.

In this image, the pods in the Workload requests for 40 CPU cores, of which 43% is actually used (about 17 cores).

What to Expect?

It depends on the type of workload.

For applications (background processes) whose resource usage is constant, expect the ratio to be around 100%.

For “bursty” applications, such as an API server, expect the ratio to be lower than 100%. Note that the value is averaged for the selected time window, therefore, a usage spike would be compensated by an idle period.

Generally, values between 80% and 120% are considered normal. Values that are higher than 100% deemed normal if it’s observed only for a relatively short time.

What to Do Otherwise?

  • A low usage indicates that the application is not properly running (not executing the expected functions) or the Workload configuration is not accurate (requests are too high compared to what the pods actually need).

  • A high usage indicates that the load is high for applications or the Workload configuration is not accurate (low requests compared to what the pods actually need).

In either case, drill down to the Kubernetes Overview Dashboard corresponding to the Workload in Explore. For example, the Kubernetes Deployment Overview Dashboard provides insight into resource usage and configuration.

Can the Value Be Higher than 100%?

Yes, it can.

  • Configuring CPU requests without limits or requests lower than limits is permissible. In these cases, you are allowing the containers to use more resources than requested, typically to handle temporary overloads.

  • Consider a Workload with two containers. Say, one container is configured to request for 1 CPU core and uses 1 CPU core (Used vs Request ratio is 100%). The other is configured without any request and uses 1 CPU core. In this example, the 2 CPU core Used to 1 CPU core Requested ratio is 200% at the Workload level.

What Does “No Data” Mean?

If the Workload is configured with no requests and limits, then the Usage vs Requests ratio cannot be computed. In this case, the chart will show “no data”. Drill down to the Dashboard in Explore to evaluate the actual usage.

You must always configure requests. Setting requests helps to detect Workloads that require reconfiguration.

Kubernetes itself might expose Workloads with no requests or limits configured. For example, the kube-system Namespace can have Workloads without requests configured.

Memory Used vs Requests

The chart shows the latest value returned by sum(avg(memory.bytes.used)) / sum(avg(kubernetes.pod.resourceRequests.memBytes)).

What Is It?

The chart shows the ratio between the total memory usage across all the pods in a Workload and the total memory requested by the Workload.

The upper bound shows the total memory requested by all the pods, expressed in the specified unit of bytes.

For instance, the image shows that the pods in the selected Workload requested for 120 GiB, of which 24% is actually used (about 29 GiB).

What to Expect?

The type of Workload determines the ratio. Values between 80% and 120% are considered normal. Values that are higher than 100% is deemed normal if it’s observed only for a relatively short time.

What to Do Otherwise?

A low memory usage indicates that the application is not properly running (not executing the expected functions) or the Workload configuration is not accurate (requests are too high compared to what the pods actually need).

A high memory usage indicates that the load is higher for applications or the Workload configuration is not accurate (low requests compared to what the pods actually need).

Given the configured limits for the Workloads and the memory pressure on the nodes, if the Workloads use more memory than what’s requested they are at risk of eviction. For more information, see Container’s Memory Limit.

In either case, drill down to the Workloads page to determine the Workload that requires a deeper analysis.

Can the Value Be Higher than 100%?

Yes, it can.

  • Configuring memory requests without limits or requests lower than limits is permissible. In these cases, you are allowing the containers to use more resources than requested, typically to handle temporary overloads.

  • Consider a Workload with two containers. Say, one container is configured to request for 1 GiB of memory and uses 1 GiB (Used vs Request ratio is 100%), while the other is configured without any request and uses 1 GiB of memory. In this example, the 2 GiB of memory used to 1 GiB requested ratio is 200% at the Workload level.

What Does “No Data” Mean?

If the Workload is configured with no memory requests and limits, then the Usage vs Requests ratio cannot be computed. In this case, the chart will show “no data”. Drill down to the Dashboard in Explore to evaluate the actual usage.

You must configure requests. It helps to detect Workloads that require reconfiguration.

Kubernetes itself might expose Workloads with no requests or limits configured. For example, the kube-system Namespace can have Workloads without requests configured.

Network I/O

The chart shows the latest value returned by avg(avg(net.bytes.total)).

What Is It?

The sparkline shows the trend of network traffic (inbound and outbound) for the Workload. The number shows the most recent rate, expressed in bytes per second in a specific unit.

For reference, the sparklines show the following number of steps (sampling):

  • Last hour: 6 steps, each for a 10-minute time slice

  • Last 6 hours: 12 steps, each for a 30-minute time slice

  • Last day: 12 steps, each for a 2-hour time slice

What to Expect?

The type of application runs in the Workload determines the metrics. Drill down to the Kubernetes Overview Dashboard corresponding to the Workload in Explore. For example, the Kubernetes Deployment Overview Dashboard provides additional details, such as network activity across pods.