Kubernetes

Metrics, Dashboards, Alerts and more for Kubernetes Integration in Sysdig Monitor.

This integration is disabled by default. See Enable and Disable Integrations to enable it in your account.

This integration has 70 metrics.

List of Alerts

Alert	Description	Format
[Kubernetes] Container Waiting	Container in waiting status for long time (CrashLoopBackOff, ImagePullErr…)	Prometheus
[Kubernetes] Container Restarting	Container restarting	Prometheus
[Kubernetes] Pod Not Ready	Pod in not ready status	Prometheus
[Kubernetes] Init Container Waiting For a Long Time	Init container in waiting state (CrashLoopBackOff, ImagePullErr…)	Prometheus
[Kubernetes] Pod Container Creating For a Long Time	Pod is stuck in ContainerCreating state	Prometheus
[Kubernetes] Pod Container Terminated With Error	Pod Container Terminated With Error (OOMKilled, Error…)	Prometheus
[Kubernetes] Init Container Terminated With Error	Init Container Terminated With Error (OOMKilled, Error…)	Prometheus
[Kubernetes] Workload with Pods not Ready	Workload with Pods not Ready (Evicted, NodeLost, UnexpectedAdmissionError)	Prometheus
[Kubernetes] Workload Replicas Mismatch	There are pod in the workload that could not start	Prometheus
[Kubernetes] Pod Not Scheduled For DaemonSet	Pods cannot be scheduled for DaemonSet	Prometheus
[Kubernetes] Pods In DaemonSet Incorrectly Scheduled	There are pods from a DaemonSet that should not be running	Prometheus
[Kubernetes] CPU Overcommit	CPU resources in the cluster are overcommitted. If a node fails, the cluster may be unable to reschedule the affected pods due to insufficient CPU capacity.	Prometheus
[Kubernetes] Memory Overcommit	Memory resources in the cluster are overcommitted. If a node fails, the cluster may be unable to reschedule the affected pods due to insufficient memory capacity.	Prometheus
[Kubernetes] CPU OverUsage	CPU OverUsage in cluster. If one node fails, the cluster will not have enough CPU to run all the current pods.	Prometheus
[Kubernetes] Memory OverUsage	Memory OverUsage in cluster. If one node fails, the cluster will not have enough memory to run all the current pods.	Prometheus
[Kubernetes] Container CPU Throttling	Container CPU usage next to limit. Possible CPU Throttling.	Prometheus
[Kubernetes] Container Memory Next To Limit	Container memory usage next to limit. Risk of Out Of Memory Kill.	Prometheus
[Kubernetes] Container CPU Unused	Container unused CPU higher than 85% of request for 8 hours.	Prometheus
[Kubernetes] Container Memory Unused	Container unused Memory higher than 85% of request for 8 hours.	Prometheus
[Kubernetes] Node Not Ready	Node in Not-Ready condition	Prometheus
[Kubernetes] Not All Nodes Are Ready	Not all nodes are in Ready condition.	Prometheus
[Kubernetes] Too Many Pods In Node	Node close to its limits of pods.	Prometheus
[Kubernetes] Node Readiness Flapping	Node availability is unstable.	Prometheus
[Kubernetes] Nodes Disappeared	Less nodes in cluster than 30 minutes before.	Prometheus
[Kubernetes] All Nodes Gone In Cluster	All Nodes Gone In Cluster.	Prometheus
[Kubernetes] Node CPU High Usage	High usage of CPU in node.	Prometheus
[Kubernetes] Node Memory High Usage	High usage of memory in node. Risk of pod eviction.	Prometheus
[Kubernetes] Node Root File System Almost Full	Root file system in node almost full. To include other file systems, change the value of the device label from ‘.root.’ to your device name	Prometheus
[Kubernetes] Max Schedulable Pod Less Than 1 CPU Core	The maximum schedulable CPU request in a pod is less than 1 core.	Prometheus
[Kubernetes] Max Schedulable Pod Less Than 512Mb Memory	The maximum schedulable memory request in a pod is less than 512Mb.	Prometheus
[Kubernetes] HPA Desired Scale Up Replicas Unreached	HPA could not reach the desired scaled up replicas for long time.	Prometheus
[Kubernetes] HPA Desired Scale Down Replicas Unreached	HPA could not reach the desired scaled down replicas for long time.	Prometheus
[Kubernetes] Job failed to complete	Job failed to complete	Prometheus
[Kubernetes] Cluster is reaching maximum pod capacity (95%)	Review cluster pod capacity to ensure pods can be scheduled.	Prometheus

List of Dashboards

Workload Status & Performance

The dashboard provides information on the Workload Status and Performance. Workload Status & Performance

Pod Status & Performance

The dashboard provides information on the Pod Status and Performance. Pod Status & Performance

Cluster / Namespace Available Resources

The dashboard provides information on the Cluster and Namespace Available Resources. Cluster / Namespace Available Resources

Cluster Capacity Planning

Dashboard used for Cluster Capacity Planning.

Container Resource Usage & Troubleshooting

The dashboard provides information on the Container Resource Usage and Troubleshooting. Container Resource Usage & Troubleshooting

Node Status & Performance

The dashboard provides information on the Node Status and Performance. Node Status & Performance

Pod Rightsizing & Workload Capacity Optimization

Dashboard used for Pod Rightsizing and Workload Capacity Optimization. Pod Rightsizing & Workload Capacity Optimization

Pod Scheduling Troubleshooting

Dashboard used for Pod Scheduling Troubleshooting.

Horizontal Pod Autoscaler

The dashboard provides information on the Horizontal Pod Autoscalers.

Kubernetes Jobs

The dashboard provides information on the Kubernetes Jobs.

List of Metrics

Metric name
container.image
container.image.tag
kube_cronjob_next_schedule_time
kube_cronjob_status_active
kube_cronjob_status_last_schedule_time
kube_daemonset_status_current_number_scheduled
kube_daemonset_status_desired_number_scheduled
kube_daemonset_status_number_misscheduled
kube_daemonset_status_number_ready
kube_hpa_status_current_replicas
kube_hpa_status_desired_replicas
kube_job_complete
kube_job_failed
kube_job_spec_completions
kube_job_status_active
kube_namespace_labels
kube_node_info
kube_node_status_allocatable
kube_node_status_allocatable_cpu_cores
kube_node_status_allocatable_memory_bytes
kube_node_status_capacity
kube_node_status_capacity_cpu_cores
kube_node_status_capacity_memory_bytes
kube_node_status_capacity_pods
kube_node_status_condition
kube_node_sysdig_host
kube_pod_container_info
kube_pod_container_resource_limits
kube_pod_container_resource_requests
kube_pod_container_status_restarts_total
kube_pod_container_status_terminated_reason
kube_pod_container_status_waiting_reason
kube_pod_info
kube_pod_init_container_status_terminated_reason
kube_pod_init_container_status_waiting_reason
kube_pod_status_ready
kube_resourcequota
kube_workload_pods_status_reason
kube_workload_status_desired
kube_workload_status_ready
kubernetes.hpa.replicas.current
kubernetes.hpa.replicas.desired
kubernetes.hpa.replicas.max
kubernetes.hpa.replicas.min
sysdig_container_cpu_cores_used
sysdig_container_cpu_quota_used_percent
sysdig_container_info
sysdig_container_memory_limit_used_percent
sysdig_container_memory_used_bytes
sysdig_container_net_connection_in_count
sysdig_container_net_connection_out_count
sysdig_container_net_connection_total_count
sysdig_container_net_error_count
sysdig_container_net_http_error_count
sysdig_container_net_http_request_time
sysdig_container_net_http_statuscode_request_count
sysdig_container_net_in_bytes
sysdig_container_net_out_bytes
sysdig_container_net_request_count
sysdig_container_net_request_time
sysdig_fs_free_bytes
sysdig_fs_inodes_used_percent
sysdig_fs_total_bytes
sysdig_fs_used_bytes
sysdig_fs_used_percent
sysdig_program_cpu_cores_used
sysdig_program_cpu_used_percent
sysdig_program_memory_used_bytes
sysdig_program_net_connection_total_count
sysdig_program_net_total_bytes

Prerequisites

None.

Installation

Installing an exporter is not required for this integration.

Agent Configuration

This integration has no default agent job.