Ceph

Metrics, Dashboards, Alerts and more for Ceph Integration in Sysdig Monitor.
Ceph

This integration is enabled by default.

Versions supported: > v15.2.12

This integration is out-of-the-box, so it doesn’t require any exporter.

This integration has 24 metrics.

Timeseries generated: 600 timeseries

List of Alerts

AlertDescriptionFormat
[Ceph] Ceph Manager is absentCeph Manager has disappeared from Prometheus target discovery.Prometheus
[Ceph] Ceph Manager is missing replicasCeph Manager is missing replicas.Prometheus
[Ceph] Ceph quorum at riskStorage cluster quorum is low. Contact Support.Prometheus
[Ceph] High number of leader changesCeph Monitor has seen a lot of leader changes per minute recently.Prometheus

List of Dashboards

Ceph

The dashboard provides information on the status, capacity, latency and throughput of Ceph. Ceph

List of Metrics

Metric name
ceph_cluster_total_bytes
ceph_cluster_total_used_bytes
ceph_health_status
ceph_mgr_status
ceph_mon_metadata
ceph_mon_num_elections
ceph_mon_quorum_status
ceph_osd_apply_latency_ms
ceph_osd_commit_latency_ms
ceph_osd_in
ceph_osd_metadata
ceph_osd_numpg
ceph_osd_op_r
ceph_osd_op_r_latency_count
ceph_osd_op_r_latency_sum
ceph_osd_op_r_out_bytes
ceph_osd_op_w
ceph_osd_op_w_in_bytes
ceph_osd_op_w_latency_count
ceph_osd_op_w_latency_sum
ceph_osd_recovery_bytes
ceph_osd_recovery_ops
ceph_osd_up
ceph_pool_max_avail

Preparing the Integration

Enable Prometheus Module

Ceph instruments Prometheus metrics and annotates the manager pod with Prometheus annotations.

Make sure that the Prometheus module is activated in the Ceph cluster by running the following command:

ceph mgr module enable prometheus

Installing

The installation of an exporter is not required for this integration.

Monitoring and Troubleshooting Ceph

This document describes important metrics and queries that you can use to monitor and troubleshoot Ceph.

Tracking metrics status

You can track Ceph metrics status with following alerts: Exporter proccess is not serving metrics

# [Ceph] Exporter Process Down
absent(ceph_health_status{kube_cluster_name=~$cluster,kube_namespace_name=~$namespace,kube_workload_name=~$workload}) > 0

Agent Configuration

This is the default agent job for this integration:

- job_name: ceph-default
  tls_config:
    insecure_skip_verify: true
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: keep
    source_labels: [__meta_kubernetes_pod_host_ip]
    regex: __HOSTIPS__
  - action: drop
    source_labels: [__meta_kubernetes_pod_annotation_promcat_sysdig_com_omit]
    regex: true
  - action: keep
    source_labels:
    - __meta_kubernetes_pod_container_name
    - __meta_kubernetes_pod_annotation_prometheus_io_port
    regex: mgr;9283
  - action: replace
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
    target_label: __scheme__
    regex: (https?)
  - action: replace
    source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name