This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Collect Prometheus Metrics

Sysdig supports collecting, storing, and querying Prometheus native metrics and labels. You can use Sysdig in the same way that you use Prometheus and leverage Prometheus Query Language (PromQL) to create dashboards and alerts. Sysdig is compatible with Prometheus HTTP API to query your monitoring data programmatically using PromQL and extend Sysdig to other platforms like Grafana.

From a metric collection standpoint, a lightweight Prometheus server is directly embedded into the Sysdig agent to facilitate metric collection. This also supports targets, instances, and jobs with filtering and relabeling using Prometheus syntax. You can configure the agent to identify these processes that expose Prometheus metric endpoints on its own host and send it to the Sysdig collector for storing and further processing.

The Prometheus product itself does not necessarily have to be installed for Prometheus metrics collection.

Agent Compatibility

See the Sysdig agent versions and compatibility with Prometheus features:

Sysdig Agent v12.2.0 and Above

The following features are enabled by default:

  • Automatically scraping any Kubernetes pods with the following annotation set: prometheus.io/scrape=true
  • Automatically scrape applications supported by Monitoring Integrations.

For more information, see Set up the Environment.

Sysdig Agent Prior to v12.0.0

Manually enable Prometheus in dragent.yaml file:

  prometheus:
       enabled: true

For more information, see Enable Promscrape V2 on Older Versions of Sysdig Agent .

Learn More

The following topics describe in detail about setting up the environment for service discovery, metrics collection, and further processing.

See the following blog posts for additional context on the Prometheus metric and how such metrics are typically used.

1 - Set Up the Environment

If you are already leveraging Kubernetes Service Discovery, specifically the approach given in prometheus-kubernetes.yml, you might already have annotations attached to the pods that mark them as eligible for scraping. Such environments can quickly begin scraping the same metrics by using the Sysdig agent in a single step.

If you are not using Kubernetes Service Discovery, follow the instructions given below:

Annotation

Ensure that the Kubernetes pods that contain your Prometheus exporters have been deployed with the following annotations to enable scraping, substituting the listening exporter-TCP-port:

spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "exporter-TCP-port"

The configuration above assumes your exporters use the typical endpoint called /metrics. If your exporter is using a different endpoint, specify by adding the following additional annotation, substituting the exporter-endpoint-name:

prometheus.io/path: "/exporter-endpoint-name"

Sample Exporter

Use the Sample Exporter to test your environment. You will quickly see auto-discovered Prometheus metrics being displayed on Sysdig Monitor. You can use this working example as a basis to similarly annotate your own exporters.

2 - Enable Prometheus Native Service Discovery

Prometheus service discovery is a standard method of finding endpoints to scrape for metrics. You configure prometheus.yaml and custom jobs to prepare for scraping endpoints in the same way you do for native Prometheus.

For metric collection, a lightweight Prometheus server, named promscrape, is directly embedded into the Sysdig agent to facilitate metric collection. Promscrape supports filtering and relabeling targets, instances, and jobs and identify them using the custom jobs configured in the prometheus.yaml file. The latest versions of Sysdig agent (above v12.0.0) by default identify the processes that expose Prometheus metric endpoints on its own host and send it to the Sysdig collector for storing and further processing. On older versions of Sysdig agent, you enable these features by configuring dragent.yaml.

Working with Promscrape

Promscrape is a lightweight Prometheus server that is embedded with the Sysdig agent. Promscrape scrapes metrics from Prometheus endpoints and sends them for storing and processing.

Promscrape has two versions: Promscrape V1 and Promscrape V2.

  • Promscrape V2

    Promscrape itself discovers targets by using the standard Prometheus configuration (native Prometheus service discovery), allowing the use of relabel_configs to find or modify targets. An instance of promscrape runs on every node that is running a Sysdig agent and is intended to collect metrics from local as well as remote targets specified in the prometheus.yaml file. The prometheus.yaml file you create is shared across all such nodes.

    Promscrape V2 is enabled by default on Sysdig agent v12.5.0 and above. On older versions of Sysdig agent, you need to manually enable Promscrape V2, which allows for native Prometheus service discovery, by setting the prom_service_discovery parameter to true in dragent.yaml.

  • Promscrape V1

    Sysdig agent discovers scrape targets through the Sysdig process_filter rules. For more information, see Process Filter.

About Promscrape V2

Supported Features

Promscrape V2 supports the following native Prometheus capabilities:

  • Relabeling: Promscrape V2 supports Prometheus native relabel_config and metric_relabel_configs. Relabel configuration enables the following:

    • Drop unnecessary metrics or unwanted labels from metrics

    • Edit the label format of the target before scraping the labels

  • Sample format: In addition to the regular sample format (metrics name, labels, and metrics reading), Promscrape V2 includes metrics type (counter, gauge, histogram, summary) to every sample sent to the agent.

  • Scraping configuration: Promscrape V2 supports all types of scraping configuration, such as federation, blackbox-exporter, and so on.

  • Label mapping: The metrics can be mapped to their source (pod, process) by using the source labels which in turn map certain Prometheus label names to the known agent tags.

Unsupported Features

  • Promscrape V2 does not support calculated metrics.

  • Promscrape V2 does not support cluster-wide features such as recording rules and alert management.

  • Service discovery configurations in Promscrape V1 (process_filter) and Promscrape V2 (prometheus.yaml) are incompatible and non-translatable.

  • Promscrape V2 collects metrics from both local and remote targets specified in the prometheus.yaml file and therefore it does not make sense to configure promscrape to scrape remote targets, because you will see metrics duplication in this case.

  • Promscrape V2 does not have the cluster view and therefore it ignores the configuration of recording rules and alerts, which is used in the cluster-wide metrics collection. Therefore, the following Prometheus Configurations are not supported

  • Sysdig uses __HOSTNAME__, which is not a standard Prometheus keyword.

Enable Promscrape V2 on Older Versions of Sysdig Agent

To enable Prometheus native service discovery on agent versions prior to 11.2:

  1. Open dragent.yaml file.

  2. Set the following Prometheus Service Discovery parameter to true:

    prometheus:
      prom_service_discovery: true
    

    If true, promscrape.v2 is used. Otherwise, promscrape.v1 is used to scrape the targets.

  3. Restart the agent.

Create Custom Jobs

Prerequisites

Ensure the following features are enabled:

  • Monitoring Integration
  • Promscrape V2

If you are using Sysdig agent v12.0.0 or above, these features are enabled by default.

Prepare Custom Job

You set up custom jobs in the Prometheus configuration file to identify endpoints that expose Prometheus metrics. Sysdig agent uses these custom jobs to scrape endpoints by using promscrape, the lightweight Prometheus server embedded in it.

Guidelines

  • Ensure that targets are scraped only by the agent running on the same node as the target. You do this by adding the host selection relabeling rules.

  • Use the sysdig specific relabeling rules to automatically get the right workload labels applied.

Example Prometheus Configuration file

The prometheus.yaml file comes with a default configuration for scraping the pods running on the local node. This configuration also includes the rules to preserve pod UID and container name labels for further correlation with Kubernetes State Metrics or Sysdig native metrics.

Here is an example prometheus.yaml file that you can use to set up custom jobs.

global:
  scrape_interval: 10s
scrape_configs:
- job_name: 'my_pod_job'
  sample_limit: 40000
  tls_config:
    insecure_skip_verify: true
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
    # Look for pod name starting with "my_pod_prefix" in namespace "my_namespace"
  - action: keep
    source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_name]
    separator: /
    regex: my_namespace/my_pod_prefix.+

    # In those pods try to scrape from port 9876
  - source_labels: [__address__]
    action: replace
    target_label: __address__
    regex: (.+?)(\\:\\d)?
    replacement: $1:9876

    # Trying to ensure we only scrape local targets
    # __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
    # of all the active network interfaces on the host
  - action: keep
    source_labels: [__meta_kubernetes_pod_host_ip]
    regex: __HOSTIPS__

    # Holding on to pod-id and container name so we can associate the metrics
    # with the container (and cluster hierarchy)
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name

Default Scrape Job

If Monitoring Integration is not enabled for you and you still want to automatically collect metrics from pods with the Prometheus annotations set (prometheus.io/scrape=true), add the following default scrape job to your prometheus.yaml file:

- job_name: 'k8s-pods'
  sample_limit: 40000
  tls_config:
    insecure_skip_verify: true
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
    # Trying to ensure we only scrape local targets
    # __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
    # of all the active network interfaces on the host
  - action: keep
    source_labels: [__meta_kubernetes_pod_host_ip]
    regex: __HOSTIPS__
  - action: keep
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    regex: true
  - action: replace
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
    target_label: __scheme__
    regex: (https?)
  - action: replace
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    target_label: __metrics_path__
    regex: (.+)
  - action: replace
    source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__

    # Holding on to pod-id and container name so we can associate the metrics
    # with the container (and cluster hierarchy)
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name

Default Prometheus Configuration File

Here is the default prometheus.yaml file.

global:
  scrape_interval: 10s
scrape_configs:
- job_name: 'k8s-pods'
  tls_config:
    insecure_skip_verify: true
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
    # Trying to ensure we only scrape local targets
    # __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
    # of all the active network interfaces on the host
  - action: keep
    source_labels: [__meta_kubernetes_pod_host_ip]
    regex: __HOSTIPS__
  - action: keep
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    regex: true
  - action: replace
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
    target_label: __scheme__
    regex: (https?)
  - action: replace
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    target_label: __metrics_path__
    regex: (.+)
  - action: replace
    source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__
    # Holding on to pod-id and container name so we can associate the metrics
    # with the container (and cluster hierarchy)
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name

Understand the Prometheus Settings

Scrape Interval

The default scrape interval is 10 seconds. However, the value can be overridden per scraping job. The scrape interval configured in the prometheus.yaml is independent of the agent configuration.

Promscrape V2 reads prometheus.yaml and initiates scraping jobs.

The metrics from targets are collected per scrape interval for each target and immediately forwarded to the agent. The agent sends the metrics every 10 seconds to the Sysdig collector. Only those metrics that have been received since the last transmission are sent to the collector. If a scraping job for a job has a scrape interval longer than 10 seconds, the agent transmissions might not include all the metrics from that job.

Hostname Selection

__HOSTIPS__ is replaced by the host IP addresses. Selection by the host IP address is preferred because of its reliability.

__HOSTNAME__ is replaced with the actual hostname before promscrape starts scraping the targets. This allows promscrape to ignore targets running on other hosts.

Relabeling Configuration

The default Prometheus configuration file contains the following two relabeling configurations:

- action: replace
  source_labels: [__meta_kubernetes_pod_uid]
  target_label: sysdig_k8s_pod_uid
- action: replace
  source_labels: [__meta_kubernetes_pod_container_name]
  target_label: sysdig_k8s_pod_container_name

These rules add two labels, sysdig_k8s_pod_uid and sysdig_k8s_pod_container_name to every metric gathered from the local targets, containing pod ID and container name respectively. These labels will be dropped from the metrics before sending them to the Sysdig collector for further processing.

Configure Prometheus Configuration File Using the Agent Configmap

Here is an example for setting up the prometheus.yaml file using the agent configmap:

apiVersion: v1
data:
  dragent.yaml: |
    new_k8s: true
    k8s_cluster_name: your-cluster-name
    metrics_excess_log: true
    10s_flush_enable: true
    app_checks_enabled: false
    use_promscrape: true
    promscrape_fastproto: true
    prometheus:
      enabled: true
      prom_service_discovery: true
      log_errors: true
      max_metrics: 200000
      max_metrics_per_process: 200000
      max_tags_per_metric: 100
      ingest_raw: true
      ingest_calculated: false
    snaplen: 512
    tags: role:cluster    
  prometheus.yaml: |
    global:
      scrape_interval: 10s
    scrape_configs:
    - job_name: 'haproxy-router'
      basic_auth:
        username: USER
        password: PASSWORD
      tls_config:
        insecure_skip_verify: true
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
        # Trying to ensure we only scrape local targets
        # We need the wildcard at the end because in AWS the node name is the FQDN,
        # whereas in Azure the node name is the base host name
      - action: keep
        source_labels: [__meta_kubernetes_pod_host_ip]
        regex: __HOSTIPS__
      - action: keep
        source_labels:
        - __meta_kubernetes_namespace
        - __meta_kubernetes_pod_name
        separator: '/'
        regex: 'default/router-1-.+'
        # Holding on to pod-id and container name so we can associate the metrics
        # with the container (and cluster hierarchy)
      - action: replace
        source_labels: [__meta_kubernetes_pod_uid]
        target_label: sysdig_k8s_pod_uid
      - action: replace
        source_labels: [__meta_kubernetes_pod_container_name]
        target_label: sysdig_k8s_pod_container_name    

kind: ConfigMap
metadata:
    labels:
      app: sysdig-agent
    name: sysdig-agent
    namespace: sysdig-agent

3 - Migrating from Promscrape V1 to V2

Promscrape is the lightweight Prometheus server in the Sysdig agent. An updated version of promscrape, named Promscrape V2 is available. This configuration is controlled by the prom_discovery_service parameter in the dragent.yaml file. To use the latest features, such as Service Discovery and Monitoring Integrations, you need to have this option enabled in your environment.

Compare Promscrape V1 and V2

The main difference between V1 and V2 is how scrape targets are determined.

In v1 targets are found through process-filtering rules configured in dragent.yaml or dragent.default.yaml (if no rules are given in dragent.yaml).The process-filtering rules are applied to all the running processes on the host. Matches are made based on process attributes, such as process name or TCP ports being listened to, as well as associated contexts from docker or Kubernetes, such as container labels or Kubernetes annotations.

With Promscrape V2, scrape targets are determined by scrape_configs fields in a prometheus.yaml file (or the prometheus-v2.default.yaml file if no prometheus.yaml exists). Because promscrape is adapted from the open-source Prometheus server, the scrape_config settings are compatible with the normal Prometheus configuration. Here is an example:

global:
  scrape_interval: 10s
scrape_configs:
- job_name: 'my_pod_job'
  sample_limit: 40000
  tls_config:
    insecure_skip_verify: true
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
    # Look for pod name starting with "my_pod_prefix" in namespace "my_namespace"
  - action:
    source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_name,__meta_kubernetes_pod_label]
    separator: /
    regex: my_namespace/my_pod_prefix.+
  - action: keep
    source_labels: [__meta_kubernetes_pod_label_app]
    regex: my_app_metrics

    # In those pods try to scrape from port 9876
  - source_labels: [__address__]
    action: replace
    target_label: __address__
    regex: (.+?)(\\:\\d)?
    replacement: $1:9876

    # Trying to ensure we only scrape local targets
    # __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
    # of all the active network interfaces on the host
  - action: keep
    source_labels: [__meta_kubernetes_pod_host_ip]
    regex: __HOSTIPS__

    # Holding on to pod-id and container name so we can associate the metrics
    # with the container (and cluster hierarchy)
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name

Migrate Using Default Configuration

The default configuration for Promscrape v1 triggers the scraping based on standard Kubernetes pod annotations and container labels. The default configuration for v2 currently triggers scraping only based on the standard Kubernetes pod annotations leveraging the Prometheus native service discovery.

Example Pod Annotations

Annotation

Value

Description

spec: template: metadata: annotations: prometheus.io/scrape: "true" prometheus.io/port: ""

true

Required field.

prometheus.io/port: ""

The port number to scrape

Optional. It will scrape all pod-registered ports if omitted.

prometheus.io/scheme

<http|https>

The default is http.

(required field)prometheus.io/path

The URL

The default is /metrics.

Example Static Job

- job_name: 'static10'
  static_configs:
    - targets: ['localhost:5010']

Guidelines

  • Users running Kubernetes with Promscrape v1 default rules and triggering scraping based on pod annotations need not take any action to migrate to v2. The migration happens automatically.

  • Users operating non-Kubernetes environments might need to continue using v1 for now, depending on how scraping is triggered. As of today promscrape.v2 doesn’t support leveraging container and Docker labels to discover Prometheus metrics endpoints. If your environment is one of these, define static jobs with the IP:port to be scrapped.

Migrate Using Custom Rules

If you relying on custom process_filter rules to collect metrics, use any method using standard Prometheus configuration syntax to scrape the endpoints. We recommend one of the following:

  • Adopt the standard approach of adding the standard Prometheus annotations to their pods. For more information, see Migrate Using Default Configuration.
  • Write a Prometheus scrape_config by using Kubernetes pods service discovery and use the appropriate pod metadata to trigger their scrapes.

See the below example for converting your process_filter rules to Prometheus terminology.

process_filter

Prometheus

- include:
    kubernetes.pod.annotation.sysdig.com/test: true
- action: keep
  source_labels: [__meta_kubernetes_pod_annotation_sysdig_com_test]
  regex: true
- include:
    kubernetes.pod.label.app: sysdig
- action: keep
  source_labels: [__meta_kubernetes_pod_label_app]
  regex: 'sysdig'
-include:
   container.label.com.sysdig.test: true

Not supported.

- include:
    process.name: test

Not supported.

- include:
    process.cmdline: sysdig-agent

Not supported.

- include:
    port: 8080
- action: keep
  source_labels: [__meta_kubernetes_pod_container_port_number]
  regex: '8080'
- include:
    container.image: sysdig-agent

Not supported.

- include:
    container.name: sysdig-agent
- action: keep
  source_labels: [__meta_kubernetes_pod_container_name]
  regex: 'sysdig-agent'
- include:
    appcheck.match: sysdig

Appchecks are not compatble with Promscrape v2. See Configure Monitoring Integrations for supported integrations.

Contact Support

If you have any queries related to promscrape migration, contact Sysdig Support.

4 - Dynamic Sampling

Dynamic sampling supports scraping a rotating set of Prometheus endpoints based on the total amount of time series scraped from each endpoint. With dynamic sampling turned on, you get consistent and up-to-date data from every Prometheus endpoints on a given node at dynamic intervals, while maintaining the data collection frequency and fidelity of Prometheus metrics via the Sysdig agent.

Sysdig’s ability to collect and process the volumes of data scraped from different Prometheus endpoints is controlled by a mix of the amount of time serires scraped from each endpoint, the total amount of Prometheus time series collected by the agent in each time window, and the frequency at which Sysdig agent collects and sends the data to the Sysdig backend. On nodes with multiple Prometheus endpoints sending high volumes of time series, the Sysdig agent may need to skip some endpoints depending on the overall volume and scrape frequency. Dynamic sampling addresses this by cycling through individual Prometheus endpoints and scraping the latest time series from each endpoint on a rotational basis such that all time series from all Prometheus endpoints are processed at dynamic intervals. The result is more timeseries being scraped and processed overall at a lower frequency. For example, instead of receiving 50,000 timeseries every 10 seconds, you might receive 100,000 timeseries every 20 seconds.

Dynamic Sampling Considerations

  • Any alerts that depend on dynamically-sampled metrics will have the same interval as the metric. Using the example here, the alerts related to either endpoint will be raised, at most, every 20 seconds.

  • The time series from all the endpoints are sent to the backend in manner to prevent data integrity issues. Therefore, if total timeseries from a particular endpoint is greater than the maximum allowed limit by any agent, the time series from that endpoint may be dropped irrespective of whether dynamic sampling is turned on or not.

  • Sysdig agent always maximizes the allowed limit for every interval as long as all the time series of an endpoint can be fit into that allowed limit.

Enable Dynamic Sampling

To configure dynamic sampling:

  1. Open the dragent.yaml file.

  2. Add the following line:

    promscrape_emit_all: true

  3. Restart the agent.