This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Consul

Metrics, Dashboards, Alerts and more for Consul Integration in Sysdig Monitor.
    Consul

    This integration is enabled by default.

    Versions supported: > 1.11.1

    This integration is out-of-the-box, so it doesn’t require any exporter.

    This integration has 64 metrics.

    Timeseries generated: 1800 timeseries

    List of Alerts

    AlertDescriptionFormat
    [Consul] KV Store update time anomalyKV Store update time anomalyPrometheus
    [Consul] Transaction time anomalyTransaction time anomalyPrometheus
    [Consul] Raft transactions count anomalyRaft transactions count anomalyPrometheus
    [Consul] Raft commit time anomalyRaft commit time anomalyPrometheus
    [Consul] Leader time to contact followers too highLeader time to contact followers too highPrometheus
    [Consul] Flapping leadershipFlapping leadershipPrometheus
    [Consul] Too many electionsToo many electionsPrometheus
    [Consul] Server cluster unhealthyServer cluster unhealthyPrometheus
    [Consul] Zero failure toleranceZero failure tolerancePrometheus
    [Consul] Client RPC requests anomalyConsul client RPC requests anomalyPrometheus
    [Consul] Client RPC requests rate limit exceededConsul client RPC requests rate limit exceededPrometheus
    [Consul] Client RPC requests failedConsul client RPC requests failedPrometheus
    [Consul] License ExpiryConsul License ExpiryPrometheus
    [Consul] Garbage Collection pause highConsul Garbage Collection pause highPrometheus
    [Consul] Garbage Collection pause too highConsul Garbage Collection pause too highPrometheus
    [Consul] Raft restore duration too highConsul Raft restore duration too highPrometheus
    [Consul] RPC requests error rate is highConsul RPC requests error rate is highPrometheus
    [Consul] Cache hit rate is lowConsul Cache hit rate is lowPrometheus
    [Consul] High 4xx RequestError RateHigh 4xx RequestError RatePrometheus
    [Consul] High Request LatencyEnvoy High Request LatencyPrometheus
    [Consul] High Response LatencyEnvoy High Response LatencyPrometheus
    [Consul] Certificate close to expireCertificate close to expirePrometheus

    List of Dashboards

    Consul

    The dashboard provides information on the status and latency of Consul. Consul

    Consul Envoy

    The dashboard provides information on the Consul Envoy proxies. Consul Envoy

    List of Metrics

    Metric name
    consul_autopilot_failure_tolerance
    consul_autopilot_healthy
    consul_client_rpc
    consul_client_rpc_exceeded
    consul_client_rpc_failed
    consul_consul_cache_bypass
    consul_consul_cache_entries_count
    consul_consul_cache_evict_expired
    consul_consul_cache_fetch_error
    consul_consul_cache_fetch_success
    consul_kvs_apply_sum
    consul_raft_apply
    consul_raft_commitTime_sum
    consul_raft_fsm_lastRestoreDuration
    consul_raft_leader_lastContact
    consul_raft_leader_oldestLogAge
    consul_raft_rpc_installSnapshot
    consul_raft_state_candidate
    consul_raft_state_leader
    consul_rpc_cross_dc
    consul_rpc_queries_blocking
    consul_rpc_query
    consul_rpc_request
    consul_rpc_request_error
    consul_runtime_gc_pause_ns
    consul_runtime_gc_pause_ns_sum
    consul_system_licenseExpiration
    consul_txn_apply_sum
    envoy_cluster_membership_change
    envoy_cluster_membership_healthy
    envoy_cluster_membership_total
    envoy_cluster_upstream_cx_active
    envoy_cluster_upstream_cx_connect_ms_bucket
    envoy_cluster_upstream_rq_active
    envoy_cluster_upstream_rq_pending_active
    envoy_cluster_upstream_rq_time_bucket
    envoy_cluster_upstream_rq_xx
    envoy_server_days_until_first_cert_expiring
    go_build_info
    go_gc_duration_seconds
    go_gc_duration_seconds_count
    go_gc_duration_seconds_sum
    go_goroutines
    go_memstats_buck_hash_sys_bytes
    go_memstats_gc_sys_bytes
    go_memstats_heap_alloc_bytes
    go_memstats_heap_idle_bytes
    go_memstats_heap_inuse_bytes
    go_memstats_heap_released_bytes
    go_memstats_heap_sys_bytes
    go_memstats_lookups_total
    go_memstats_mallocs_total
    go_memstats_mcache_inuse_bytes
    go_memstats_mcache_sys_bytes
    go_memstats_mspan_inuse_bytes
    go_memstats_mspan_sys_bytes
    go_memstats_next_gc_bytes
    go_memstats_stack_inuse_bytes
    go_memstats_stack_sys_bytes
    go_memstats_sys_bytes
    go_threads
    process_cpu_seconds_total
    process_max_fds
    process_open_fds

    Preparing the Integration

    Enable Prometheus Metrics and Disable Hostname in Metrics

    As seen in Consul documentation pages Helm Global Metrics and Prometheus Retention Time, to make Consul expose an endpoint for scraping metrics, you need to enable a few global.metrics configurations. You also need to enable the telemetry.disable_hostname “extra configurations” in the Consul Server and Client, so the metrics don’t contain the name of the instances.

    If you install Consul with Helm, you need to use the following flags:

    --set 'global.metrics.enabled=true'
    --set 'global.metrics.enableAgentMetrics=true'
    --set 'server.extraConfig="{"telemetry": {"disable_hostname": true}}"'
    --set 'client.extraConfig="{"telemetry": {"disable_hostname": true}}"'
    

    Installing

    The installation of an exporter is not required for this integration.

    Monitoring and Troubleshooting Consul

    This document describes important metrics and queries that you can use to monitor and troubleshoot Consul.

    Tracking metrics status

    You can track Consul metrics status with following alerts: Exporter proccess is not serving metrics

    # [Consul] Exporter Process Down
    absent(consul_autopilot_healthy{kube_cluster_name=~$cluster,kube_namespace_name=~$namespace,kube_workload_name=~$workload}) > 0
    

    Exporter proccess is not serving metrics

    # [Consul] Exporter Process Down
    absent(envoy_cluster_upstream_cx_active{kube_cluster_name=~$cluster,kube_namespace_name=~$namespace,kube_workload_name=~$workload}) > 0
    

    Agent Configuration

    These are the default agent jobs for this integration:

    - job_name: 'consul-server-default'
      metrics_path: '/v1/agent/metrics'
      params:
        format: ['prometheus']
      tls_config:
        insecure_skip_verify: true
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        source_labels: [__meta_kubernetes_pod_host_ip]
        regex: __HOSTIPS__
      - action: keep
        source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        regex: true
      - action: drop
        source_labels: [__meta_kubernetes_pod_annotation_promcat_sysdig_com_omit]
        regex: true
      - action: replace
        source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
        target_label: __scheme__
        regex: (https?)
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_container_name
        - __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
        regex: (consul);(.{0}$)
        replacement: consul
        target_label: __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
      - action: keep
        source_labels:
        - __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
        regex: "consul"
      - action: keep
        source_labels: [__address__]
        regex: (.*:8500)
      - action: replace
        source_labels: [__meta_kubernetes_pod_uid]
        target_label: sysdig_k8s_pod_uid
      - action: replace
        source_labels: [__meta_kubernetes_pod_container_name]
        target_label: sysdig_k8s_pod_container_name
    - job_name: 'consul-envoy-default'
      tls_config:
        insecure_skip_verify: true
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        source_labels: [__meta_kubernetes_pod_host_ip]
        regex: __HOSTIPS__
      - action: keep
        source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        regex: true
      - action: drop
        source_labels: [__meta_kubernetes_pod_annotation_promcat_sysdig_com_omit]
        regex: true
      - action: replace
        source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
        target_label: __scheme__
        regex: (https?)
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_container_name
        - __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
        regex: (envoy-sidecar);(.{0}$)
        replacement: consul
        target_label: __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
      - action: keep
        source_labels:
        - __meta_kubernetes_pod_annotation_promcat_sysdig_com_integration_type
        regex: "consul"
      - action: replace
        source_labels: [__address__]
        regex: (.+?)(\\:\\d)?
        replacement: $1:20200
        target_label: __address__
      - action: replace
        source_labels: [__meta_kubernetes_pod_uid]
        target_label: sysdig_k8s_pod_uid
      - action: replace
        source_labels: [__meta_kubernetes_pod_container_name]
        target_label: sysdig_k8s_pod_container_name
      metric_relabel_configs:
        - source_labels: [__name__]
          regex: (envoy_cluster_upstream_cx_active|envoy_cluster_upstream_rq_active|envoy_cluster_upstream_rq_pending_active|envoy_cluster_membership_total|envoy_cluster_membership_healthy|envoy_cluster_membership_change|envoy_cluster_upstream_rq_xx|envoy_cluster_upstream_cx_connect_ms_bucket|envoy_server_days_until_first_cert_expiring|envoy_cluster_upstream_rq_time_bucket)
          action: keep