Kubernetes

Kubernetes

Kubernetes

This integration is disabled by default. Please contact Sysdig Support to enable it in your account.

List of Alerts:

AlertDescriptionFormat
[Kubernetes] Container WaitingContainer in waiting status for long time (CrashLoopBackOff, ImagePullErr…)Prometheus
[Kubernetes] Container RestartingContainer restartingPrometheus
[Kubernetes] Pod Not ReadyPod in not ready statusPrometheus
[Kubernetes] Init Container Waiting For a Long TimeInit container in waiting state (CrashLoopBackOff, ImagePullErr…)Prometheus
[Kubernetes] Pod Container Creating For a Long TimePod is stuck in ContainerCreating statePrometheus
[Kubernetes] Pod Container Terminated With ErrorPod Container Terminated With Error (OOMKilled, Error…)Prometheus
[Kubernetes] Init Container Terminated With ErrorInit Container Terminated With Error (OOMKilled, Error…)Prometheus
[Kubernetes] Workload with Pods not ReadyWorkload with Pods not Ready (Evicted, NodeLost, UnexpectedAdmissionError)Prometheus
[Kubernetes] Workload Replicas MismatchThere are pod in the workload that could not startPrometheus
[Kubernetes] Pod Not Scheduled For DaemonSetPods cannot be scheduled for DaemonSetPrometheus
[Kubernetes] Pods In DaemonSet Incorrectly ScheduledThere are pods from a DaemonSet that should not be runningPrometheus
[Kubernetes] CPU OvercommitCPU OverCommit in cluster. If one node fails, the cluster will not be able to schedule all the current pods.Prometheus
[Kubernetes] Memory OvercommitMemory OverCommit in cluster. If one node fails, the cluster will not be able to schedule all the current pods.Prometheus
[Kubernetes] CPU OverUsageCPU OverUsage in cluster. If one node fails, the cluster will not have enough CPU to run all the current pods.Prometheus
[Kubernetes] Memory OverUsageMemory OverUsage in cluster. If one node fails, the cluster will not have enough memory to run all the current pods.Prometheus
[Kubernetes] Container CPU ThrottlingContainer CPU usage next to limit. Possible CPU Throttling.Prometheus
[Kubernetes] Container Memory Next To LimitContainer memory usage next to limit. Risk of Out Of Memory Kill.Prometheus
[Kubernetes] Container CPU UnusedContainer unused CPU higher than 85% of request for 8 hours.Prometheus
[Kubernetes] Container Memory UnusedContainer unused Memory higher than 85% of request for 8 hours.Prometheus
[Kubernetes] Node Not ReadyNode in Not-Ready conditionPrometheus
[Kubernetes] Too Many Pods In NodeNode close to its limits of pods.Prometheus
[Kubernetes] Node Readiness FlappingNode availability is unstable.Prometheus
[Kubernetes] Nodes DisappearedLess nodes in cluster than 30 minutes before.Prometheus
[Kubernetes] All Nodes Gone In ClusterAll Nodes Gone In Cluster.Prometheus
[Kubernetes] Node CPU High UsageHigh usage of CPU in node.Prometheus
[Kubernetes] Node Memory High UsageHigh usage of memory in node. Risk of pod eviction.Prometheus
[Kubernetes] Node Root File System Almost FullRoot file system in node almost full. To include other file systems, change the value of the device label from ‘.root.’ to your device namePrometheus
[Kubernetes] Max Schedulable Pod Less Than 1 CPU CoreThe maximum schedulable CPU request in a pod is less than 1 core.Prometheus
[Kubernetes] Max Schedulable Pod Less Than 512Mb MemoryThe maximum schedulable memory request in a pod is less than 512Mb.Prometheus
[Kubernetes] HPA Desired Scale Up Replicas UnreachedHPA could not reach the desired scaled up replicas for long time.Prometheus
[Kubernetes] HPA Desired Scale Down Replicas UnreachedHPA could not reach the desired scaled down replicas for long time.Prometheus
[Kubernetes] Job failed to completeJob failed to completePrometheus

List of Dashboards:

  • Workload Status & Performance Workload Status & Performance
  • Pod Status & Performance Pod Status & Performance
  • Cluster / Namespace Available Resources Cluster / Namespace Available Resources
  • Cluster Capacity Planning Cluster Capacity Planning
  • Container Resource Usage & Troubleshooting Container Resource Usage & Troubleshooting
  • Node Status & Performance Node Status & Performance
  • Pod Rightsizing & Workload Capacity Optimization Pod Rightsizing & Workload Capacity Optimization
  • Pod Scheduling Troubleshooting Pod Scheduling Troubleshooting
  • Horizontal Pod Autoscaler Horizontal Pod Autoscaler
  • Kubernetes Jobs Kubernetes Jobs

List of Metrics:

  • container.image
  • container.image.tag
  • cpu.cores.used
  • kube_cronjob_next_schedule_time
  • kube_cronjob_status_active
  • kube_cronjob_status_last_schedule_time
  • kube_daemonset_status_current_number_scheduled
  • kube_daemonset_status_desired_number_scheduled
  • kube_daemonset_status_number_misscheduled
  • kube_daemonset_status_number_ready
  • kube_hpa_status_current_replicas
  • kube_hpa_status_desired_replicas
  • kube_job_complete
  • kube_job_failed
  • kube_job_spec_completions
  • kube_job_status_active
  • kube_namespace_labels
  • kube_node_info
  • kube_node_status_allocatable
  • kube_node_status_allocatable_cpu_cores
  • kube_node_status_allocatable_memory_bytes
  • kube_node_status_capacity
  • kube_node_status_capacity_cpu_cores
  • kube_node_status_capacity_memory_bytes
  • kube_node_status_capacity_pods
  • kube_node_status_condition
  • kube_node_sysdig_host
  • kube_pod_container_info
  • kube_pod_container_resource_limits
  • kube_pod_container_resource_requests
  • kube_pod_container_status_restarts_total
  • kube_pod_container_status_terminated_reason
  • kube_pod_container_status_waiting_reason
  • kube_pod_info
  • kube_pod_init_container_status_terminated_reason
  • kube_pod_init_container_status_waiting_reason
  • kube_pod_status_ready
  • kube_resourcequota
  • kube_workload_pods_status_reason
  • kube_workload_status_desired
  • kube_workload_status_ready
  • kubernetes.hpa.replicas.current
  • kubernetes.hpa.replicas.desired
  • kubernetes.hpa.replicas.max
  • kubernetes.hpa.replicas.min
  • memory.bytes.used
  • net.bytes.in
  • net.bytes.out
  • net.bytes.total
  • net.connection.count.total
  • net.error.count
  • net.http.error.count
  • net.http.request.time
  • net.request.count
  • net.request.time
  • sysdig_container_cpu_cores_used
  • sysdig_container_cpu_quota_used_percent
  • sysdig_container_info
  • sysdig_container_memory_limit_used_percent
  • sysdig_container_memory_used_bytes
  • sysdig_container_net_connection_in_count
  • sysdig_container_net_connection_out_count
  • sysdig_container_net_error_count
  • sysdig_container_net_http_error_count
  • sysdig_container_net_http_request_time
  • sysdig_container_net_http_statuscode_request_count
  • sysdig_container_net_in_bytes
  • sysdig_container_net_out_bytes
  • sysdig_container_net_request_count
  • sysdig_container_net_request_time
  • sysdig_fs_free_bytes
  • sysdig_fs_inodes_used_percent
  • sysdig_fs_total_bytes
  • sysdig_fs_used_bytes
  • sysdig_fs_used_percent
  • sysdig_program_cpu_used_percent
  • sysdig_program_memory_used_bytes