Azure Kubernetes Service

Metrics, Dashboards, Alerts and more for Azure Kubernetes Service Integration in Sysdig Monitor.
Azure Kubernetes Service

This integration can be enabled via the Azure Cloud Metrics.

This integration has 16 metrics.

List of Alerts

AlertDescriptionFormat
[Azure Kubernetes Service] Cluster Node not readyKubernetes Cluster Node not ready.Prometheus
[Azure Kubernetes Service] Cluster node memory rss usageKubernetes Cluster node high memory rss usage.Prometheus
[Azure Kubernetes Service] Cluster node memory working set usageKubernetes Cluster node high memory working set usage.Prometheus
[Azure Kubernetes Service] Cluster node disk usageKubernetes Cluster node high disk usage.Prometheus
[Azure Kubernetes Service] Pod in failed phaseKubernetes Cluster pod is in failed phase.Prometheus

List of Dashboards

Azure Kubernetes Service

The dashboard provides information on the Azure Azure Kubernetes Service (AKS). Azure Kubernetes Service

List of Metrics

Metric name
azure_containerservice_managedclusters_apiserver_current_inflight_requests_avg
azure_containerservice_managedclusters_kube_node_status_allocatable_cpu_cores_avg
azure_containerservice_managedclusters_kube_node_status_allocatable_memory_bytes_avg
azure_containerservice_managedclusters_kube_node_status_condition_avg
azure_containerservice_managedclusters_kube_pod_status_phase_avg
azure_containerservice_managedclusters_kube_pod_status_ready_avg
azure_containerservice_managedclusters_node_cpu_usage_millicores_avg
azure_containerservice_managedclusters_node_cpu_usage_percentage_avg
azure_containerservice_managedclusters_node_disk_usage_bytes_avg
azure_containerservice_managedclusters_node_disk_usage_percentage_avg
azure_containerservice_managedclusters_node_memory_rss_bytes_avg
azure_containerservice_managedclusters_node_memory_rss_percentage_avg
azure_containerservice_managedclusters_node_memory_working_set_bytes_avg
azure_containerservice_managedclusters_node_memory_working_set_percentage_avg
azure_containerservice_managedclusters_node_network_in_bytes_avg
azure_containerservice_managedclusters_node_network_out_bytes_avg

Monitoring and Troubleshooting Azure Kubernetes Service

This document describes important metrics and queries that you can use to monitor and troubleshoot Azure Kubernetes Service.

Cluster State

Nodes not ready

Use the following query to check if there are nodes with NotReady state:

azure_containerservice_managedclusters_kube_node_status_condition_avg{condition="Ready", status2="NotReady"}

A return value other than 0 indicates a problem in the cluster.

CPU cores

Use the following query to get the number of cores on your cluster:

azure_containerservice_managedclusters_kube_node_status_allocatable_cpu_cores_avg

Memory Available

Use the following query to get the available memory:

azure_containerservice_managedclusters_kube_node_status_allocatable_memory_bytes_avg

A low value can indicate memory pressure, a new node should be created manually or by the autoscaler.

Cluster nodes

CPU usage

Use the following query to get the CPU percentage usage:

azure_containerservice_managedclusters_node_cpu_usage_percentage_avg

A return value greater than 95 indicates CPU pressure in a node.

Memory RSS

Memory RSS is the amount of anonymous and swap cache memory.

Use the following query to get the CPU percentage usage:

azure_containerservice_managedclusters_node_memory_rss_percentage_avg

Memory working set

The memory working set includes the amount of kernel memory, dirty memory, and recently accessed memory.

Use the following query to get the CPU percentage usage:

azure_containerservice_managedclusters_node_memory_working_set_percentage_avg

A return value greater than 95 indicates memory pressure in a node.

Disk usage

Use the following query to get the disk usage:

azure_containerservice_managedclusters_node_disk_usage_percentage_avg

A high value return indicates that a disk is running low space.

Network In

Use the following query to get network bandwidth in:

azure_containerservice_managedclusters_node_network_in_bytes_avg

Network Out

Use the following query to get network bandwidth out:

azure_containerservice_managedclusters_node_network_out_bytes_avg

API Inflight Requests

Use the following query to get the API requests:

azure_containerservice_managedclusters_apiserver_current_inflight_requests_avg