Advisor

Advisor brings your metrics, alerts, and events into a focused and curated view to help you operate and troubleshoot Kubernetes infrastructure.

Advisor is available to only our SaaS users. The feature is not currently available for on-prem environments.

Advisor presents your infrastructure grouped by cluster, namespace, workload, and pod. You cannot currently configure a custom grouping. Depending on the selection, you will see different curated views and you can switch between the following:

  • Advisories
  • Triggered alerts
  • Events from Kubernetes, container engines, and custom user events
  • Cluster usage and capacity
  • Key golden signals (requests, latency, errors) derived from system calls
  • Kubernetes metrics about the health and status of Kubernetes objects
  • Container live logs
  • Process and network telemetry (CPU, memory, network connections, etc.)
  • Monitoring Integrations

The time window of metrics displayed on Advisor is the last 1 hour of collected data. To see historical values for a metric, drill down to a related dashboard or explore a metric using the Explore UI.

Advisories

Advisories evaluate the thousands of data points being sent by the Sysdig agent, and display a prioritized view of key problems in your infrastructure that affect the health and availability of your clusters and the workloads running on them.

When you select an advisory, relevant information related to the issue is surfaced, such as metrics, events, live logs, and remediation guidance. This enables you to pinpoint and resolve problems faster. Following SRE best practices, they are not necessarily symptoms of a problem, but instead causes that may not necessarily want to be alerted upon.

Example Issues Detected

Problem

Description

CrashLoopBackOff

A CrashLoopBackOff means that you have a pod starting, crashing, starting again, and then crashing again. This could cause applications to be degraded or unavailable.

Container Error

Persistent application error resulting in containers being terminated. An application error, or exit code 1, means the container was terminated due to an application problem.

CPU Throttling

Containers are hitting their CPU limit and being throttled. CPU throttling will not result in the container being killed, but will be starved of CPU resulting in application slow down.

OOM Kill

When a container reaches its memory limit it is terminated with an OOMKilled status, or exit code 137. This can lead to application instability or unavailability.

Image Pull Error

A container is failing to start as it cannot pull the image.

Advisories are automatically resolved when the problem is no longer detected. You cannot customize the Advisories evaluated. These are fully managed by Sysdig.

Live Logs

Advisor can display live logs for a container, which is the equivalent of running kubectl logs. This is useful for troubleshooting application errors or problems such as pods in a CrashLoopBackOff state.

When selecting a Pod, a Logs tab will appear. If there are multiple containers within a pod, you can select the container you wish to view logs for. Once requested, logs are streamed for 3 minutes before the session is automatically closed (you can simply re-start streaming if necessary).

Live logs are tailed on-demand and thus not persisted. After a session is closed they are no longer accessible.

Manage User Access to Live Logs

By default live logs is available to users within the scope of their Sysdig Team. Use Custom Roles to manage live logs permissions.

Configure Agent for Live Logs

Live logs are enabled by default in agent 12.7.0 or newer versions. Older versions of the Sysdig agent do not support live logs.

Live logs can be enabled or disabled within the agent configuration.

To turn live logs off globally for a cluster, add the following in the dragent.yaml file:

live_logs:
  enabled: false

If using Helm, this is configured via sysdig.settings. For example:

sysdig:
 # Advanced settings. Any option in here will be directly translated into dragent.yaml in the Configmap
 settings:
   live_logs:
     enabled: false

Troubleshoot Live Logs

If there is a problem with live logs, the following errors will be returned. Contact Sysdig Support for additional help and troubleshooting.

Error CodeCause
401kubelet doesn’t have the bearer token authorization enabled.
403The sysdig-agent ClusterRole doesn’t have the node/proxy permission.

YAML Configuration

Advisor can display the YAML configuration for pods, which is the equivalent of running kubectl get pod <pod> -o yaml. This is useful to see the applied configuration of a pod in a raw format, as well as metadata and status. To view the YAML, select a pod in Advisor and open the YAML tab.

Support for viewing YAML config is for pods only. Other object types are not yet supported.

Manage Access to YAML Configuration

By default, displaying YAML configuration is available to users within the scope of their Sysdig Team. Use Custom Roles to manage permissions. The permission for displaying YAML configuration is Advisor - Kubernetes API.

Configure Agent for YAML Configuration

YAML configuration can be enabled in agent 12.9.0 or newer versions. Older versions of the Sysdig agent do not support YAML configuration.

You can use the agent configuration to enable the YAML configuration.

To turn support for YAML configuration on globally for a cluster, add the following in the dragent.yaml file:

k8s_command:
  enabled: true

If you are using helm, edit sysdig.settings. For example:

sysdig:
 # Advanced settings. Any option in here will be directly translated into dragent.yaml in the Configmap
 settings:
   k8s_command:
     enabled: true

Learn More

Topics in This Section
Overview