Kubernetes

Metrics, Dashboards, Alerts and more for Kubernetes Integration in Sysdig Monitor.
Kubernetes

This integration is disabled by default. See Enable and Disable Integrations to enable it in your account.

This integration has 70 metrics.

List of Alerts

AlertDescriptionFormat
[Kubernetes] Container WaitingContainer in waiting status for long time (CrashLoopBackOff, ImagePullErr…)Prometheus
[Kubernetes] Container RestartingContainer restartingPrometheus
[Kubernetes] Pod Not ReadyPod in not ready statusPrometheus
[Kubernetes] Init Container Waiting For a Long TimeInit container in waiting state (CrashLoopBackOff, ImagePullErr…)Prometheus
[Kubernetes] Pod Container Creating For a Long TimePod is stuck in ContainerCreating statePrometheus
[Kubernetes] Pod Container Terminated With ErrorPod Container Terminated With Error (OOMKilled, Error…)Prometheus
[Kubernetes] Init Container Terminated With ErrorInit Container Terminated With Error (OOMKilled, Error…)Prometheus
[Kubernetes] Workload with Pods not ReadyWorkload with Pods not Ready (Evicted, NodeLost, UnexpectedAdmissionError)Prometheus
[Kubernetes] Workload Replicas MismatchThere are pod in the workload that could not startPrometheus
[Kubernetes] Pod Not Scheduled For DaemonSetPods cannot be scheduled for DaemonSetPrometheus
[Kubernetes] Pods In DaemonSet Incorrectly ScheduledThere are pods from a DaemonSet that should not be runningPrometheus
[Kubernetes] CPU OvercommitCPU resources in the cluster are overcommitted. If a node fails, the cluster may be unable to reschedule the affected pods due to insufficient CPU capacity.Prometheus
[Kubernetes] Memory OvercommitMemory resources in the cluster are overcommitted. If a node fails, the cluster may be unable to reschedule the affected pods due to insufficient memory capacity.Prometheus
[Kubernetes] CPU OverUsageCPU OverUsage in cluster. If one node fails, the cluster will not have enough CPU to run all the current pods.Prometheus
[Kubernetes] Memory OverUsageMemory OverUsage in cluster. If one node fails, the cluster will not have enough memory to run all the current pods.Prometheus
[Kubernetes] Container CPU ThrottlingContainer CPU usage next to limit. Possible CPU Throttling.Prometheus
[Kubernetes] Container Memory Next To LimitContainer memory usage next to limit. Risk of Out Of Memory Kill.Prometheus
[Kubernetes] Container CPU UnusedContainer unused CPU higher than 85% of request for 8 hours.Prometheus
[Kubernetes] Container Memory UnusedContainer unused Memory higher than 85% of request for 8 hours.Prometheus
[Kubernetes] Node Not ReadyNode in Not-Ready conditionPrometheus
[Kubernetes] Not All Nodes Are ReadyNot all nodes are in Ready condition.Prometheus
[Kubernetes] Too Many Pods In NodeNode close to its limits of pods.Prometheus
[Kubernetes] Node Readiness FlappingNode availability is unstable.Prometheus
[Kubernetes] Nodes DisappearedLess nodes in cluster than 30 minutes before.Prometheus
[Kubernetes] All Nodes Gone In ClusterAll Nodes Gone In Cluster.Prometheus
[Kubernetes] Node CPU High UsageHigh usage of CPU in node.Prometheus
[Kubernetes] Node Memory High UsageHigh usage of memory in node. Risk of pod eviction.Prometheus
[Kubernetes] Node Root File System Almost FullRoot file system in node almost full. To include other file systems, change the value of the device label from ‘.root.’ to your device namePrometheus
[Kubernetes] Max Schedulable Pod Less Than 1 CPU CoreThe maximum schedulable CPU request in a pod is less than 1 core.Prometheus
[Kubernetes] Max Schedulable Pod Less Than 512Mb MemoryThe maximum schedulable memory request in a pod is less than 512Mb.Prometheus
[Kubernetes] HPA Desired Scale Up Replicas UnreachedHPA could not reach the desired scaled up replicas for long time.Prometheus
[Kubernetes] HPA Desired Scale Down Replicas UnreachedHPA could not reach the desired scaled down replicas for long time.Prometheus
[Kubernetes] Job failed to completeJob failed to completePrometheus
[Kubernetes] Cluster is reaching maximum pod capacity (95%)Review cluster pod capacity to ensure pods can be scheduled.Prometheus

List of Dashboards

Workload Status & Performance

The dashboard provides information on the Workload Status and Performance. Workload Status & Performance

Pod Status & Performance

The dashboard provides information on the Pod Status and Performance. Pod Status & Performance

Cluster / Namespace Available Resources

The dashboard provides information on the Cluster and Namespace Available Resources. Cluster / Namespace Available Resources

Cluster Capacity Planning

Dashboard used for Cluster Capacity Planning. Cluster Capacity Planning

Container Resource Usage & Troubleshooting

The dashboard provides information on the Container Resource Usage and Troubleshooting. Container Resource Usage & Troubleshooting

Node Status & Performance

The dashboard provides information on the Node Status and Performance. Node Status & Performance

Pod Rightsizing & Workload Capacity Optimization

Dashboard used for Pod Rightsizing and Workload Capacity Optimization. Pod Rightsizing & Workload Capacity Optimization

Pod Scheduling Troubleshooting

Dashboard used for Pod Scheduling Troubleshooting. Pod Scheduling Troubleshooting

Horizontal Pod Autoscaler

The dashboard provides information on the Horizontal Pod Autoscalers. Horizontal Pod Autoscaler

Kubernetes Jobs

The dashboard provides information on the Kubernetes Jobs. Kubernetes Jobs

List of Metrics

Metric name
container.image
container.image.tag
kube_cronjob_next_schedule_time
kube_cronjob_status_active
kube_cronjob_status_last_schedule_time
kube_daemonset_status_current_number_scheduled
kube_daemonset_status_desired_number_scheduled
kube_daemonset_status_number_misscheduled
kube_daemonset_status_number_ready
kube_hpa_status_current_replicas
kube_hpa_status_desired_replicas
kube_job_complete
kube_job_failed
kube_job_spec_completions
kube_job_status_active
kube_namespace_labels
kube_node_info
kube_node_status_allocatable
kube_node_status_allocatable_cpu_cores
kube_node_status_allocatable_memory_bytes
kube_node_status_capacity
kube_node_status_capacity_cpu_cores
kube_node_status_capacity_memory_bytes
kube_node_status_capacity_pods
kube_node_status_condition
kube_node_sysdig_host
kube_pod_container_info
kube_pod_container_resource_limits
kube_pod_container_resource_requests
kube_pod_container_status_restarts_total
kube_pod_container_status_terminated_reason
kube_pod_container_status_waiting_reason
kube_pod_info
kube_pod_init_container_status_terminated_reason
kube_pod_init_container_status_waiting_reason
kube_pod_status_ready
kube_resourcequota
kube_workload_pods_status_reason
kube_workload_status_desired
kube_workload_status_ready
kubernetes.hpa.replicas.current
kubernetes.hpa.replicas.desired
kubernetes.hpa.replicas.max
kubernetes.hpa.replicas.min
sysdig_container_cpu_cores_used
sysdig_container_cpu_quota_used_percent
sysdig_container_info
sysdig_container_memory_limit_used_percent
sysdig_container_memory_used_bytes
sysdig_container_net_connection_in_count
sysdig_container_net_connection_out_count
sysdig_container_net_connection_total_count
sysdig_container_net_error_count
sysdig_container_net_http_error_count
sysdig_container_net_http_request_time
sysdig_container_net_http_statuscode_request_count
sysdig_container_net_in_bytes
sysdig_container_net_out_bytes
sysdig_container_net_request_count
sysdig_container_net_request_time
sysdig_fs_free_bytes
sysdig_fs_inodes_used_percent
sysdig_fs_total_bytes
sysdig_fs_used_bytes
sysdig_fs_used_percent
sysdig_program_cpu_cores_used
sysdig_program_cpu_used_percent
sysdig_program_memory_used_bytes
sysdig_program_net_connection_total_count
sysdig_program_net_total_bytes

Prerequisites

None.

Installation

Installing an exporter is not required for this integration.

Agent Configuration

This integration has no default agent job.