Enable Prometheus Native Service Discovery

Prometheus service discovery is a standard method of finding endpoints to scrape for metrics. You configure prometheus.yaml to set up the scraping mechanism. As of agent v10.5.0, Sysdig supports the native Prometheus service discovery and you can configure in prometheus.yaml in the same way you do for native Prometheus.

When enabled in dragent.yaml, the new version of promscrape will use the configured prometheus.yaml to find the endpoints instead of using those that the agent has found through the use of process_filter rules. The new version of promscrape is named promscrape.v2 .

Promscrape V2

  • promscrape.v2 supports Prometheus native relabel_config in addition to  metric_relabel_configs. Relabel configuration enables the following:

    • Edit the label format of the target before scraping the labels

    • Drop unnecessary metrics or unwanted labels from metrics

  • In addition to the regular sample format (metrics name, labels, and metrics reading), promscrape.v2includes metrics type (counter, gauge, histogram, summary) to every sample sent to the agent.

  • promscrape.v2 supports all types of scraping configuration, such as federation, blackbox-exporter, and so on.

  • The metrics can be mapped to their source (pod, process) by using the source labels which map certain Prometheus label names to the known agent tags.

Limitations of Promscrape V2

  • promscrape.v2 does not support calculated metrics.

  • promscrape.v2 does not support cluster-wide features such as recording rules and alert management.

  • Service discovery configurations in promscrape and promscrape.v2 are incompatible and non-translatable.

  • promscrape.v2 when enabled, will run on every node that is running an agent and is intended to collect the metrics from local or remote targets specified in the prometheus.yaml file.

    The prometheus.yaml is shared across all promscrape instances. It does not make sense to configure promscrape to scrape remote targets, because we will have metrics duplication in this case.

  • promscrape.v2 does not have the cluster view and therefore it ignores the configuration of recording rules and alerts, which is used in the cluster-wide metrics collection.

    Therefore, the following Prometheus Configurations are not supported

  • Sysdig uses __HOSTNAME__, which is not a standard Prometheus keyword.

Enable Promscrape V2

To enable Prometheus native service discovery:

  1. Open dragent.yaml file.

  2. Set the following Prometheus Service Discovery parameter to true:

    prometheus:
      prom_service_discovery: true

    If true, promscrape.v2 is used. Otherwise, promscrape is used to scrape the targets.

  3. Restart the agent.

Default Prometheus Configuration File

Here is the default prometheus.yaml file.

global:
  scrape_interval: 10s
scrape_configs:
- job_name: 'k8s-pods'
  tls_config:
    insecure_skip_verify: true
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
    # Trying to ensure we only scrape local targets
    # We need the wildcard at the end because in AWS the node name is the FQDN,
    # whereas in Azure the node name is the base host name
  - action: keep
    source_labels: [__meta_kubernetes_pod_node_name]
    regex: __HOSTNAME__.*
  - action: keep
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    regex: true
  - action: replace
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
    target_label: __scheme__
    regex: (https?)
  - action: replace
    source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    target_label: __metrics_path__
    regex: (.+)
  - action: replace
    source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__
    # Holding on to pod-id and container name so we can associate the metrics
    # with the container (and cluster hierarchy)
  - action: replace
    source_labels: [__meta_kubernetes_pod_uid]
    target_label: sysdig_k8s_pod_uid
  - action: replace
    source_labels: [__meta_kubernetes_pod_container_name]
    target_label: sysdig_k8s_pod_container_name 

The Prometheus configuration file comes with a default configuration for scraping the pods running on the local node. This configuration also includes the rules to preserve pod UID and container name labels for further correlation with Kubernetes State Metrics or Sysdig native metrics.

Scrape Interval

The default scrape interval is 10 seconds. However, the value can be overridden per scraping job. The scrape interval configured in the prometheus.yaml is independent of the agent configuration.

promscrape.v2 reads prometheus.yaml and initiates scraping jobs.

The metrics from targets are collected per scrape interval for each target and immediately forwarded to the agent. The agent sends the metrics every 10 seconds to the Sysdig collector. Only those metrics that have been received since the last transmission are sent to the collector. If a scraping job for a job has a scrape interval longer than 10 seconds, the agent transmissions might not include all the metrics from that job.

Hostname

__HOSTNAME__ is replaced with the actual hostname before promscrape starts scraping the targets. This allows promscrape to ignore targets running on other hosts.

Jobs

The default Prometheus configuration file contains the following two jobs:

  • job_name: 'k8s-pods': Scrapes the targets exposed over HTTP.

  • job_name: 'k8s-pods-tls': Scrapes the targets exposed over HTTPS.

Relabeling Configuration

The default Prometheus configuration file contains the following two relabeling configurations:

- action: replace
  source_labels: [__meta_kubernetes_pod_uid]
  target_label: sysdig_k8s_pod_uid
- action: replace
  source_labels: [__meta_kubernetes_pod_container_name]
  target_label: sysdig_k8s_pod_container_name

These rules add two labels, sysdig_k8s_pod_uid and sysdig_k8s_pod_container_name to every metric gathered from the local targets, containing pod ID and container name respectively. These labels will be dropped from the metrics before sending them to the Sysdig collector for further processing.