Consul

Metrics, Dashboards, Alerts and more for Consul Integration in Sysdig Monitor.
Consul

This integration is enabled by default.

Versions supported: > 1.11.1

This integration is out-of-the-box, so it doesn’t require any exporter.

This integration has 64 metrics.

Timeseries generated: 1800 timeseries

List of Alerts

AlertDescriptionFormat
[Consul] KV Store update time anomalyKV Store update time anomalyPrometheus
[Consul] Transaction time anomalyTransaction time anomalyPrometheus
[Consul] Raft transactions count anomalyRaft transactions count anomalyPrometheus
[Consul] Raft commit time anomalyRaft commit time anomalyPrometheus
[Consul] Leader time to contact followers too highLeader time to contact followers too highPrometheus
[Consul] Flapping leadershipFlapping leadershipPrometheus
[Consul] Too many electionsToo many electionsPrometheus
[Consul] Server cluster unhealthyServer cluster unhealthyPrometheus
[Consul] Zero failure toleranceZero failure tolerancePrometheus
[Consul] Client RPC requests anomalyConsul client RPC requests anomalyPrometheus
[Consul] Client RPC requests rate limit exceededConsul client RPC requests rate limit exceededPrometheus
[Consul] Client RPC requests failedConsul client RPC requests failedPrometheus
[Consul] License ExpiryConsul License ExpiryPrometheus
[Consul] Garbage Collection pause highConsul Garbage Collection pause highPrometheus
[Consul] Garbage Collection pause too highConsul Garbage Collection pause too highPrometheus
[Consul] Raft restore duration too highConsul Raft restore duration too highPrometheus
[Consul] RPC requests error rate is highConsul RPC requests error rate is highPrometheus
[Consul] Cache hit rate is lowConsul Cache hit rate is lowPrometheus
[Consul] High 4xx RequestError RateHigh 4xx RequestError RatePrometheus
[Consul] High Request LatencyEnvoy High Request LatencyPrometheus
[Consul] High Response LatencyEnvoy High Response LatencyPrometheus
[Consul] Certificate close to expireCertificate close to expirePrometheus

List of Dashboards

Consul

The dashboard provides information on the status and latency of Consul. Consul

Consul Envoy

The dashboard provides information on the Consul Envoy proxies. Consul Envoy

List of Metrics

Metric name
consul_autopilot_failure_tolerance
consul_autopilot_healthy
consul_client_rpc
consul_client_rpc_exceeded
consul_client_rpc_failed
consul_consul_cache_bypass
consul_consul_cache_entries_count
consul_consul_cache_evict_expired
consul_consul_cache_fetch_error
consul_consul_cache_fetch_success
consul_kvs_apply_sum
consul_raft_apply
consul_raft_commitTime_sum
consul_raft_fsm_lastRestoreDuration
consul_raft_leader_lastContact
consul_raft_leader_oldestLogAge
consul_raft_rpc_installSnapshot
consul_raft_state_candidate
consul_raft_state_leader
consul_rpc_cross_dc
consul_rpc_queries_blocking
consul_rpc_query
consul_rpc_request
consul_rpc_request_error
consul_runtime_gc_pause_ns
consul_runtime_gc_pause_ns_sum
consul_system_licenseExpiration
consul_txn_apply_sum
envoy_cluster_membership_change
envoy_cluster_membership_healthy
envoy_cluster_membership_total
envoy_cluster_upstream_cx_active
envoy_cluster_upstream_cx_connect_ms_bucket
envoy_cluster_upstream_rq_active
envoy_cluster_upstream_rq_pending_active
envoy_cluster_upstream_rq_time_bucket
envoy_cluster_upstream_rq_xx
envoy_server_days_until_first_cert_expiring
go_build_info
go_gc_duration_seconds
go_gc_duration_seconds_count
go_gc_duration_seconds_sum
go_goroutines
go_memstats_buck_hash_sys_bytes
go_memstats_gc_sys_bytes
go_memstats_heap_alloc_bytes
go_memstats_heap_idle_bytes
go_memstats_heap_inuse_bytes
go_memstats_heap_released_bytes
go_memstats_heap_sys_bytes
go_memstats_lookups_total
go_memstats_mallocs_total
go_memstats_mcache_inuse_bytes
go_memstats_mcache_sys_bytes
go_memstats_mspan_inuse_bytes
go_memstats_mspan_sys_bytes
go_memstats_next_gc_bytes
go_memstats_stack_inuse_bytes
go_memstats_stack_sys_bytes
go_memstats_sys_bytes
go_threads
process_cpu_seconds_total
process_max_fds
process_open_fds

Preparing the Integration

Enable Prometheus Metrics and Disable Hostname in Metrics

As seen in Consul documentation pages Helm Global Metrics and Prometheus Retention Time, to make Consul expose an endpoint for scraping metrics, you need to enable a few global.metrics configurations. You also need to enable the telemetry.disable_hostname “extra configurations” in the Consul Server and Client, so the metrics don’t contain the name of the instances.

If you install Consul with Helm, you need to use the following flags:

--set 'global.metrics.enabled=true'
--set 'global.metrics.enableAgentMetrics=true'
--set 'server.extraConfig="{"telemetry": {"disable_hostname": true}}"'
--set 'client.extraConfig="{"telemetry": {"disable_hostname": true}}"'

Installing

The installation of an exporter is not required for this integration.

Monitoring and Troubleshooting Consul

This document describes important metrics and queries that you can use to monitor and troubleshoot Consul.

Tracking metrics status

You can track Consul metrics status with following alerts: Exporter proccess is not serving metrics

# [Consul] Exporter Process Down
absent(consul_autopilot_healthy{kube_cluster_name=~$cluster,kube_namespace_name=~$namespace,kube_workload_name=~$workload}) > 0

Exporter proccess is not serving metrics

# [Consul] Exporter Process Down
absent(envoy_cluster_upstream_cx_active{kube_cluster_name=~$cluster,kube_namespace_name=~$namespace,kube_workload_name=~$workload}) > 0

Agent Configuration

These are the default agent jobs for this integration:

- job_name: 'consul-server-default'
  metrics_path: '/v1/agent/metrics'
  params:
    format: ['prometheus']
  tls_config:
    insecure_skip_verify: true
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: keep
    source_labels: [__meta_kubernetes_pod_host_ip]
    regex: __HOSTIPS__
  - action: keep
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    regex: true
  - action: drop
    source_labels: [__meta_kubernetes_pod_annotation_promcat_sysdig_com_omit]
    regex: true
  - action: replace
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
    target_label: __scheme__
    regex: (https?)
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_container_name
    - __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
    regex: (consul);(.{0}$)
    replacement: consul
    target_label: __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
  - action: keep
    source_labels:
    - __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
    regex: "consul"
  - action: keep
    source_labels: [__address__]
    regex: (.*:8500)
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name
- job_name: 'consul-envoy-default'
  tls_config:
    insecure_skip_verify: true
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: keep
    source_labels: [__meta_kubernetes_pod_host_ip]
    regex: __HOSTIPS__
  - action: keep
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    regex: true
  - action: drop
    source_labels: [__meta_kubernetes_pod_annotation_promcat_sysdig_com_omit]
    regex: true
  - action: replace
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
    target_label: __scheme__
    regex: (https?)
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_container_name
    - __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
    regex: (envoy-sidecar);(.{0}$)
    replacement: consul
    target_label: __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
  - action: keep
    source_labels:
    - __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
    regex: "consul"
  - action: replace
    source_labels: [__address__]
    regex: (.+?)(\\:\\d)?
    replacement: $1:20200
    target_label: __address__
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name
  metric_relabel_configs:
    - source_labels: [__name__]
      regex: (envoy_cluster_upstream_cx_active|envoy_cluster_upstream_rq_active|envoy_cluster_upstream_rq_pending_active|envoy_cluster_membership_total|envoy_cluster_membership_healthy|envoy_cluster_membership_change|envoy_cluster_upstream_rq_xx|envoy_cluster_upstream_cx_connect_ms_bucket|envoy_server_days_until_first_cert_expiring|envoy_cluster_upstream_rq_time_bucket)
      action: keep