Kubernetes etcd

Metrics, Dashboards, Alerts and more for Kubernetes etcd Integration in Sysdig Monitor.
Kubernetes etcd

This integration is enabled by default.

This integration is out-of-the-box, so it doesn’t require any exporter.

This integration has 52 metrics.

List of Alerts

AlertDescriptionFormat
[Etcd] Etcd Members DownThere are members down.Prometheus
[Etcd] Etcd Insufficient MembersEtcd cluster has insufficient membersPrometheus
[Etcd] Etcd No LeaderMember has no leader.Prometheus
[Etcd] Etcd High Number Of Leader ChangesLeader changes within the last 15 minutes.Prometheus
[Etcd] Etcd High Number Of Failed GRPC RequestsHigh number of failed grpc requestsPrometheus
[Etcd] Etcd GRPC Requests SlowgRPC requests are taking too much timePrometheus
[Etcd] Etcd High Number Of Failed ProposalsHigh number of proposal failures within the last 30 minutes on etcd instancePrometheus
[Etcd] Etcd High Fsync Durations99th percentile fync durations are too highPrometheus
[Etcd] Etcd High Commit Durations99th percentile commit durations are too highPrometheus

List of Dashboards

Kubernetes Etcd

The dashboard provides information on the Kubernetes Etcd. Kubernetes Etcd

List of Metrics

Metric name
etcd_debugging_mvcc_db_total_size_in_bytes
etcd_disk_backend_commit_duration_seconds_bucket
etcd_disk_wal_fsync_duration_seconds_bucket
etcd_grpc_proxy_cache_hits_total
etcd_grpc_proxy_cache_misses_total
etcd_mvcc_db_total_size_in_bytes
etcd_network_client_grpc_received_bytes_total
etcd_network_client_grpc_sent_bytes_total
etcd_network_peer_received_bytes_total
etcd_network_peer_received_failures_total
etcd_network_peer_round_trip_time_seconds_bucket
etcd_network_peer_sent_bytes_total
etcd_network_peer_sent_failures_total
etcd_server_has_leader
etcd_server_id
etcd_server_leader_changes_seen_total
etcd_server_proposals_applied_total
etcd_server_proposals_committed_total
etcd_server_proposals_failed_total
etcd_server_proposals_pending
go_build_info
go_gc_duration_seconds
go_gc_duration_seconds_count
go_gc_duration_seconds_sum
go_goroutines
go_info
go_memstats_buck_hash_sys_bytes
go_memstats_gc_sys_bytes
go_memstats_heap_alloc_bytes
go_memstats_heap_idle_bytes
go_memstats_heap_inuse_bytes
go_memstats_heap_released_bytes
go_memstats_heap_sys_bytes
go_memstats_lookups_total
go_memstats_mallocs_total
go_memstats_mcache_inuse_bytes
go_memstats_mcache_sys_bytes
go_memstats_mspan_inuse_bytes
go_memstats_mspan_sys_bytes
go_memstats_next_gc_bytes
go_memstats_stack_inuse_bytes
go_memstats_stack_sys_bytes
go_memstats_sys_bytes
go_threads
grpc_server_handled_total
grpc_server_handling_seconds_bucket
grpc_server_started_total
process_cpu_seconds_total
process_max_fds
process_open_fds
sysdig_container_cpu_cores_used
sysdig_container_memory_used_bytes

Preparing the Integration

No preparations are required for this integration.

Installing

Installing an exporter is not required for this integration.

Agents Configuration

These are the default agent jobs for this integration:

- job_name: etcd-default
  scheme: https
  tls_config:
    insecure_skip_verify: true
    cert_file: /host/etc/kubernetes/pki/etcd/ca.crt
    key_file: /host/etc/kubernetes/pki/etcd/ca.key
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: keep
    source_labels: [__meta_kubernetes_pod_host_ip]
    regex: __HOSTIPS__
  - source_labels: [__meta_kubernetes_pod_phase]
    action: keep
    regex: Running
  - action: keep
    source_labels:
    - __meta_kubernetes_namespace
    - __meta_kubernetes_pod_name
    separator: '/'
    regex: 'kube-system/etcd-.+'
  - source_labels:
    - __address__
    action: replace
    target_label: __address__
    regex: (.+?)(\\:\\d)?
    replacement: $1:2379
    # Holding on to pod-id and container name so we can associate the metrics
    # with the container (and cluster hierarchy)
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name
  metric_relabel_configs:
  - source_labels: [__name__]
    regex: (go_build_info|etcd_server_has_leader|etcd_server_leader_changes_seen_total|etcd_server_proposals_failed_total|go_info|go_gc_duration_seconds|go_gc_duration_seconds_count|go_gc_duration_seconds_sum|go_memstats_buck_hash_sys_bytes|go_memstats_gc_sys_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_idle_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_released_bytes|go_memstats_heap_sys_bytes|go_memstats_lookups_total|go_memstats_mallocs_total|go_memstats_mcache_inuse_bytes|go_memstats_mcache_sys_bytes|go_memstats_mspan_inuse_bytes|go_memstats_mspan_sys_bytes|go_memstats_next_gc_bytes|go_memstats_stack_inuse_bytes|go_memstats_stack_sys_bytes|go_memstats_sys_bytes|go_threads|process_cpu_seconds_total|grpc_server_started_total|grpc_server_started_total|grpc_server_started_total|grpc_server_handled_total|etcd_debugging_mvcc_db_total_size_in_bytes|etcd_disk_wal_fsync_duration_seconds_bucket|etcd_disk_backend_commit_duration_seconds_bucket|sysdig_container_memory_used_bytes|etcd_server_proposals_committed_total|etcd_server_proposals_applied_total|sysdig_container_cpu_cores_used|go_goroutines|grpc_server_handled_total|grpc_server_handled_total|etcd_server_id|etcd_disk_backend_commit_duration_seconds_bucket|etcd_grpc_proxy_cache_hits_total|etcd_grpc_proxy_cache_misses_total|etcd_network_client_grpc_received_bytes_total|etcd_network_client_grpc_sent_bytes_total|process_max_fds|process_open_fds|etcd_server_proposals_pending|etcd_network_peer_sent_failures_total|etcd_network_peer_received_failures_total|etcd_network_peer_round_trip_time_seconds_bucket|etcd_network_client_grpc_sent_bytes_total|etcd_network_client_grpc_received_bytes_total|etcd_network_peer_sent_bytes_total|etcd_network_peer_received_bytes_total|grpc_server_handling_seconds_bucket|apiserver_client_certificate_expiration_seconds_bucket|apiserver_client_certificate_expiration_seconds_sum|apiserver_client_certificate_expiration_seconds_count|etcd_mvcc_db_total_size_in_bytes)
    action: keep
- job_name: etcd-legacy-default
  scheme: https
  tls_config:
    insecure_skip_verify: true
    cert_file: /host/etc/kubernetes/pki/etcd-manager-main/etcd-clients-ca.crt
    key_file: /host/etc/kubernetes/pki/etcd-manager-main/etcd-clients-ca.key
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: keep
    source_labels: [__meta_kubernetes_pod_host_ip]
    regex: __HOSTIPS__
  - source_labels: [__meta_kubernetes_pod_phase]
    action: keep
    regex: Running
  - action: keep
    source_labels:
    - __meta_kubernetes_namespace
    - __meta_kubernetes_pod_name
    separator: '/'
    regex: 'kube-system/etcd-manager-main.+'
  - source_labels:
    - __address__
    action: replace
    target_label: __address__
    regex: (.+?)(\\:\\d)?
    replacement: $1:4001
    # Holding on to pod-id and container name so we can associate the metrics
    # with the container (and cluster hierarchy)
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name
  metric_relabel_configs:
  - source_labels: [__name__]
    regex: (etcd_server_has_leader|etcd_server_leader_changes_seen_total|etcd_server_proposals_failed_total|go_info|go_gc_duration_seconds|go_gc_duration_seconds_count|go_gc_duration_seconds_sum|go_memstats_buck_hash_sys_bytes|go_memstats_gc_sys_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_idle_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_released_bytes|go_memstats_heap_sys_bytes|go_memstats_lookups_total|go_memstats_mallocs_total|go_memstats_mcache_inuse_bytes|go_memstats_mcache_sys_bytes|go_memstats_mspan_inuse_bytes|go_memstats_mspan_sys_bytes|go_memstats_next_gc_bytes|go_memstats_stack_inuse_bytes|go_memstats_stack_sys_bytes|go_memstats_sys_bytes|go_threads|process_cpu_seconds_total|grpc_server_started_total|grpc_server_started_total|grpc_server_started_total|grpc_server_handled_total|etcd_debugging_mvcc_db_total_size_in_bytes|etcd_disk_wal_fsync_duration_seconds_bucket|etcd_disk_backend_commit_duration_seconds_bucket|sysdig_container_memory_used_bytes|etcd_server_proposals_committed_total|etcd_server_proposals_applied_total|sysdig_container_cpu_cores_used|go_goroutines|grpc_server_handled_total|grpc_server_handled_total|etcd_server_id|etcd_disk_backend_commit_duration_seconds_bucket|etcd_grpc_proxy_cache_hits_total|etcd_grpc_proxy_cache_misses_total|etcd_network_client_grpc_received_bytes_total|etcd_network_client_grpc_sent_bytes_total|process_max_fds|process_open_fds|etcd_server_proposals_pending|etcd_network_peer_sent_failures_total|etcd_network_peer_received_failures_total|etcd_network_peer_round_trip_time_seconds_bucket|etcd_network_client_grpc_sent_bytes_total|etcd_network_client_grpc_received_bytes_total|etcd_network_peer_sent_bytes_total|etcd_network_peer_received_bytes_total|grpc_server_handling_seconds_bucket|apiserver_client_certificate_expiration_seconds_bucket|apiserver_client_certificate_expiration_seconds_sum|apiserver_client_certificate_expiration_seconds_count|etcd_mvcc_db_total_size_in_bytes)
    action: keep