This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Ceph

Metrics, Dashboards, Alerts and more for Ceph Integration in Sysdig Monitor.
    Ceph

    This integration is enabled by default.

    Versions supported: > v15.2.12

    This integration is out-of-the-box, so it doesn’t require any exporter.

    This integration has 24 metrics.

    Timeseries generated: 600 timeseries

    List of Alerts

    AlertDescriptionFormat
    [Ceph] Ceph Manager is absentCeph Manager has disappeared from Prometheus target discovery.Prometheus
    [Ceph] Ceph Manager is missing replicasCeph Manager is missing replicas.Prometheus
    [Ceph] Ceph quorum at riskStorage cluster quorum is low. Contact Support.Prometheus
    [Ceph] High number of leader changesCeph Monitor has seen a lot of leader changes per minute recently.Prometheus

    List of Dashboards

    Ceph

    The dashboard provides information on the status, capacity, latency and throughput of Ceph. Ceph

    List of Metrics

    Metric name
    ceph_cluster_total_bytes
    ceph_cluster_total_used_bytes
    ceph_health_status
    ceph_mgr_status
    ceph_mon_metadata
    ceph_mon_num_elections
    ceph_mon_quorum_status
    ceph_osd_apply_latency_ms
    ceph_osd_commit_latency_ms
    ceph_osd_in
    ceph_osd_metadata
    ceph_osd_numpg
    ceph_osd_op_r
    ceph_osd_op_r_latency_count
    ceph_osd_op_r_latency_sum
    ceph_osd_op_r_out_bytes
    ceph_osd_op_w
    ceph_osd_op_w_in_bytes
    ceph_osd_op_w_latency_count
    ceph_osd_op_w_latency_sum
    ceph_osd_recovery_bytes
    ceph_osd_recovery_ops
    ceph_osd_up
    ceph_pool_max_avail

    Preparing the Integration

    Enable Prometheus Module

    Ceph instruments Prometheus metrics and annotates the manager pod with Prometheus annotations.

    Make sure that the Prometheus module is activated in the Ceph cluster by running the following command:

    ceph mgr module enable prometheus
    

    Installing

    The installation of an exporter is not required for this integration.

    Monitoring and Troubleshooting Ceph

    This document describes important metrics and queries that you can use to monitor and troubleshoot Ceph.

    Tracking metrics status

    You can track Ceph metrics status with following alerts: Exporter proccess is not serving metrics

    # [Ceph] Exporter Process Down
    absent(ceph_health_status{kube_cluster_name=~$cluster,kube_namespace_name=~$namespace,kube_workload_name=~$workload}) > 0
    

    Agent Configuration

    This is the default agent job for this integration:

    - job_name: ceph-default
      tls_config:
        insecure_skip_verify: true
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        source_labels: [__meta_kubernetes_pod_host_ip]
        regex: __HOSTIPS__
      - action: drop
        source_labels: [__meta_kubernetes_pod_annotation_promcat_sysdig_com_omit]
        regex: true
      - action: keep
        source_labels:
        - __meta_kubernetes_pod_container_name
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        regex: mgr;9283
      - action: replace
        source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
        target_label: __scheme__
        regex: (https?)
      - action: replace
        source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: replace
        source_labels: [__meta_kubernetes_pod_uid]
        target_label: sysdig_k8s_pod_uid
      - action: replace
        source_labels: [__meta_kubernetes_pod_container_name]
        target_label: sysdig_k8s_pod_container_name