Kubernetes etcd

Metrics, Dashboards, Alerts and more for Kubernetes etcd Integration in Sysdig Monitor.
Kubernetes etcd

This integration is enabled by default.

This integration is out-of-the-box, so it doesn’t require any exporter.

This integration has 54 metrics.

List of Alerts

AlertDescriptionFormat
[Etcd] Etcd Members DownThere are members down.Prometheus
[Etcd] Etcd Insufficient MembersEtcd cluster has insufficient membersPrometheus
[Etcd] Etcd No LeaderMember has no leader.Prometheus
[Etcd] Etcd High Number Of Leader ChangesLeader changes within the last 15 minutes.Prometheus
[Etcd] Etcd High Number Of Failed GRPC RequestsHigh number of failed grpc requestsPrometheus
[Etcd] Etcd GRPC Requests SlowgRPC requests are taking too much timePrometheus
[Etcd] Etcd High Number Of Failed ProposalsHigh number of proposal failures within the last 30 minutes on etcd instancePrometheus
[Etcd] Etcd High Fsync Durations99th percentile fync durations are too highPrometheus
[Etcd] Etcd High Commit Durations99th percentile commit durations are too highPrometheus
[Etcd] Etcd HighNumber Of Failed HTTP RequestsHigh number of failed http requestsPrometheus
[Etcd] Etcd HTTP Requests SlowHttps request are slowPrometheus

List of Dashboards

Kubernetes Etcd

The dashboard provides information on the Kubernetes Etcd. Kubernetes Etcd

List of Metrics

Metric name
etcd_debugging_mvcc_db_total_size_in_bytes
etcd_disk_backend_commit_duration_seconds_bucket
etcd_disk_wal_fsync_duration_seconds_bucket
etcd_grpc_proxy_cache_hits_total
etcd_grpc_proxy_cache_misses_total
etcd_http_failed_total
etcd_http_received_total
etcd_http_successful_duration_seconds_bucket
etcd_mvcc_db_total_size_in_bytes
etcd_network_client_grpc_received_bytes_total
etcd_network_client_grpc_sent_bytes_total
etcd_network_peer_received_bytes_total
etcd_network_peer_received_failures_total
etcd_network_peer_round_trip_time_seconds_bucket
etcd_network_peer_sent_bytes_total
etcd_network_peer_sent_failures_total
etcd_server_has_leader
etcd_server_id
etcd_server_leader_changes_seen_total
etcd_server_proposals_applied_total
etcd_server_proposals_committed_total
etcd_server_proposals_failed_total
etcd_server_proposals_pending
go_build_info
go_gc_duration_seconds
go_gc_duration_seconds_count
go_gc_duration_seconds_sum
go_goroutines
go_memstats_buck_hash_sys_bytes
go_memstats_gc_sys_bytes
go_memstats_heap_alloc_bytes
go_memstats_heap_idle_bytes
go_memstats_heap_inuse_bytes
go_memstats_heap_released_bytes
go_memstats_heap_sys_bytes
go_memstats_lookups_total
go_memstats_mallocs_total
go_memstats_mcache_inuse_bytes
go_memstats_mcache_sys_bytes
go_memstats_mspan_inuse_bytes
go_memstats_mspan_sys_bytes
go_memstats_next_gc_bytes
go_memstats_stack_inuse_bytes
go_memstats_stack_sys_bytes
go_memstats_sys_bytes
go_threads
grpc_server_handled_total
grpc_server_handling_seconds_bucket
grpc_server_started_total
process_cpu_seconds_total
process_max_fds
process_open_fds
sysdig_container_cpu_cores_used
sysdig_container_memory_used_bytes

Preparing the Integration

Add Certificate for Sysdig Agent

Disclaimer: This patch only works in vanilla Kubernetes

kubectl -n sysdig-agent patch ds sysdig-agent -p '{"spec":{"template":{"spec":{"volumes":[{"hostPath":{"path":"/etc/kubernetes/pki/etcd-manager-main","type":"DirectoryOrCreate"},"name":"etcd-certificates"}]}}}}'
kubectl -n sysdig-agent patch ds sysdig-agent -p '{"spec":{"template":{"spec":{"containers":[{"name":"sysdig","volumeMounts": [{"mountPath": "/etc/kubernetes/pki/etcd-manager","name": "etcd-certificates"}]}]}}}}'

Installing

The installation of an exporter is not required for this integration.

Agent Configuration

This is the default agent job for this integration:

- job_name: etcd-default
  scheme: https
  tls_config:
    insecure_skip_verify: true
    cert_file: /etc/kubernetes/pki/etcd-manager/etcd-clients-ca.crt
    key_file: /etc/kubernetes/pki/etcd-manager/etcd-clients-ca.key
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: keep
    source_labels: [__meta_kubernetes_pod_host_ip]
    regex: __HOSTIPS__
  - action: keep
    source_labels:
    - __meta_kubernetes_namespace
    - __meta_kubernetes_pod_name
    separator: '/'
    regex: 'kube-system/etcd-manager-main.+'
  - source_labels:
    - __address__
    action: replace
    target_label: __address__
    regex: (.+?)(\\:\\d)?
    replacement: $1:4001
    # Holding on to pod-id and container name so we can associate the metrics
    # with the container (and cluster hierarchy)
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name
  metric_relabel_configs:
  - source_labels: [__name__]
    regex: (etcd_http_failed_total|etcd_http_received_total|etcd_http_successful_duration_seconds_bucket|etcd_server_has_leader|etcd_server_leader_changes_seen_total|etcd_server_proposals_failed_total|go_build_info|go_gc_duration_seconds|go_gc_duration_seconds_count|go_gc_duration_seconds_sum|go_memstats_buck_hash_sys_bytes|go_memstats_gc_sys_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_idle_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_released_bytes|go_memstats_heap_sys_bytes|go_memstats_lookups_total|go_memstats_mallocs_total|go_memstats_mcache_inuse_bytes|go_memstats_mcache_sys_bytes|go_memstats_mspan_inuse_bytes|go_memstats_mspan_sys_bytes|go_memstats_next_gc_bytes|go_memstats_stack_inuse_bytes|go_memstats_stack_sys_bytes|go_memstats_sys_bytes|go_threads|process_cpu_seconds_total|grpc_server_started_total|grpc_server_started_total|grpc_server_started_total|grpc_server_handled_total|etcd_debugging_mvcc_db_total_size_in_bytes|etcd_disk_wal_fsync_duration_seconds_bucket|etcd_disk_backend_commit_duration_seconds_bucket|sysdig_container_memory_used_bytes|etcd_server_proposals_committed_total|etcd_server_proposals_applied_total|sysdig_container_cpu_cores_used|go_goroutines|grpc_server_handled_total|grpc_server_handled_total|etcd_server_id|etcd_disk_backend_commit_duration_seconds_bucket|etcd_grpc_proxy_cache_hits_total|etcd_grpc_proxy_cache_misses_total|etcd_network_client_grpc_received_bytes_total|etcd_network_client_grpc_sent_bytes_total|process_max_fds|process_open_fds|etcd_server_proposals_pending|etcd_network_peer_sent_failures_total|etcd_network_peer_received_failures_total|etcd_network_peer_round_trip_time_seconds_bucket|etcd_network_client_grpc_sent_bytes_total|etcd_network_client_grpc_received_bytes_total|etcd_network_peer_sent_bytes_total|etcd_network_peer_received_bytes_total|grpc_server_handling_seconds_bucket|apiserver_client_certificate_expiration_seconds_bucket|apiserver_client_certificate_expiration_seconds_sum|apiserver_client_certificate_expiration_seconds_count|etcd_mvcc_db_total_size_in_bytes)
    action: keep