Enable Prometheus Native Service Discovery
Prometheus service discovery is a standard method of finding endpoints to scrape for metrics. You configure prometheus.yaml
and custom jobs to prepare for scraping endpoints in the same way you do for native Prometheus.
For metric collection, a lightweight Prometheus server, named promscrape
, is directly embedded into the Sysdig agent to facilitate metric collection. Promscrape supports filtering and relabeling targets, instances, and jobs and identify them using the custom jobs configured in the prometheus.yaml
file. The latest versions of Sysdig agent (above v12.0.0) by default identify the processes that expose Prometheus metric endpoints on its own host and send it to the Sysdig collector for storing and further processing. On older versions of Sysdig agent, you enable these features by configuring dragent.yaml
.
Working with Promscrape
Promscrape is a lightweight Prometheus server that is embedded with the Sysdig agent. Promscrape scrapes metrics from Prometheus endpoints and sends them for storing and processing.
Promscrape has two versions: Promscrape V1 and Promscrape V2.
Promscrape V2
Promscrape itself discovers targets by using the standard Prometheus configuration (native Prometheus service discovery), allowing the use of
relabel_configs
to find or modify targets. An instance of promscrape runs on every node that is running a Sysdig agent and is intended to collect metrics from local as well as remote targets specified in theprometheus.yaml
file. Theprometheus.yaml
file you create is shared across all such nodes.Promscrape V2 is enabled by default on Sysdig agent v12.5.0 and above. On older versions of Sysdig agent, you need to manually enable Promscrape V2, which allows for native Prometheus service discovery, by setting the
prom_service_discovery
parameter totrue
indragent.yaml
.Promscrape V1
Sysdig agent discovers scrape targets through the Sysdig
process_filter
rules. For more information, see Process Filter.
About Promscrape V2
Supported Features
Promscrape V2 supports the following native Prometheus capabilities:
Relabeling: Promscrape V2 supports Prometheus native relabel_config and metric_relabel_configs. Relabel configuration enables the following:
Drop unnecessary metrics or unwanted labels from metrics
Edit the label format of the target before scraping the labels
Sample format: In addition to the regular sample format (metrics name, labels, and metrics reading), Promscrape V2 includes metrics type (counter, gauge, histogram, summary) to every sample sent to the agent.
Scraping configuration: Promscrape V2 supports all types of scraping configuration, such as federation, blackbox-exporter, and so on.
Label mapping: The metrics can be mapped to their source (pod, process) by using the source labels which in turn map certain Prometheus label names to the known agent tags.
Unsupported Features
Promscrape V2 does not support calculated metrics.
Promscrape V2 does not support cluster-wide features such as recording rules and alert management.
Service discovery configurations in Promscrape V1 (
process_filter
) and Promscrape V2 (prometheus.yaml
) are incompatible and non-translatable.Promscrape V2 collects metrics from both local and remote targets specified in the
prometheus.yaml
file and therefore it does not make sense to configure promscrape to scrape remote targets, because you will see metrics duplication in this case.Promscrape V2 does not have the cluster view and therefore it ignores the configuration of recording rules and alerts, which is used in the cluster-wide metrics collection. Therefore, the following Prometheus Configurations are not supported
Sysdig uses
__HOSTNAME__
, which is not a standard Prometheus keyword.
Enable Promscrape V2 on Older Versions of Sysdig Agent
To enable Prometheus native service discovery on agent versions prior to 11.2:
Open
dragent.yaml
file.Set the following Prometheus Service Discovery parameter to true:
prometheus: prom_service_discovery: true
If true,
promscrape.v2
is used. Otherwise,promscrape.v1
is used to scrape the targets.Restart the agent.
Create Custom Jobs
Prerequisites
Ensure the following features are enabled:
- Monitoring Integration
- Promscrape V2
If you are using Sysdig agent v12.0.0 or above, these features are enabled by default.
Prepare Custom Job
You set up custom jobs in the Prometheus configuration file to identify endpoints that expose Prometheus metrics. Sysdig agent uses these custom jobs to scrape endpoints by using promscrape, the lightweight Prometheus server embedded in it.
Guidelines
Ensure that targets are scraped only by the agent running on the same node as the target. You do this by adding the host selection relabeling rules.
Use the sysdig specific relabeling rules to automatically get the right workload labels applied.
Example Prometheus Configuration file
The prometheus.yaml
file comes with a default configuration for
scraping the pods running on the local node. This configuration also
includes the rules to preserve pod UID and container name labels for
further correlation with Kubernetes State Metrics or Sysdig native
metrics.
Here is an example prometheus.yaml
file that you can use to set up custom jobs.
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'my_pod_job'
sample_limit: 40000
tls_config:
insecure_skip_verify: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Look for pod name starting with "my_pod_prefix" in namespace "my_namespace"
- action: keep
source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_name]
separator: /
regex: my_namespace/my_pod_prefix.+
# In those pods try to scrape from port 9876
- source_labels: [__address__]
action: replace
target_label: __address__
regex: (.+?)(\\:\\d)?
replacement: $1:9876
# Trying to ensure we only scrape local targets
# __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
# of all the active network interfaces on the host
- action: keep
source_labels: [__meta_kubernetes_pod_host_ip]
regex: __HOSTIPS__
# Target only running pods
- source_labels: [__meta_kubernetes_pod_phase]
action: keep
regex: Running
# Holding on to pod-id and container name so we can associate the metrics
# with the container (and cluster hierarchy)
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: sysdig_k8s_pod_uid
- action: replace
source_labels: [__meta_kubernetes_pod_container_name]
target_label: sysdig_k8s_pod_container_name
Default Scrape Job
If Monitoring Integration is not enabled for you and you still want to automatically collect metrics from pods with the Prometheus annotations set (prometheus.io/scrape=true
), add the following default scrape job to your prometheus.yaml
file:
- job_name: 'k8s-pods'
sample_limit: 40000
tls_config:
insecure_skip_verify: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Trying to ensure we only scrape local targets
# __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
# of all the active network interfaces on the host
- action: keep
source_labels: [__meta_kubernetes_pod_host_ip]
regex: __HOSTIPS__
- action: keep
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
regex: true
- source_labels: [__meta_kubernetes_pod_phase]
action: keep
regex: Running
- action: replace
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
target_label: __scheme__
regex: (https?)
- action: replace
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
target_label: __metrics_path__
regex: (.+)
- action: replace
source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# Holding on to pod-id and container name so we can associate the metrics
# with the container (and cluster hierarchy)
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: sysdig_k8s_pod_uid
- action: replace
source_labels: [__meta_kubernetes_pod_container_name]
target_label: sysdig_k8s_pod_container_name
Default Prometheus Configuration File
Here is the default prometheus.yaml
file.
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'k8s-pods'
tls_config:
insecure_skip_verify: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Trying to ensure we only scrape local targets
# __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
# of all the active network interfaces on the host
- action: keep
source_labels: [__meta_kubernetes_pod_host_ip]
regex: __HOSTIPS__
- action: keep
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
regex: true
- action: drop
source_labels: [__meta_kubernetes_pod_annotation_promcat_sysdig_com_omit]
regex: true
- source_labels: [__meta_kubernetes_pod_phase]
action: keep
regex: Running
# Drop for nginx-ingress
- action: drop
source_labels:
- __meta_kubernetes_pod_container_name
regex: 'controller|nginx-ingress-controller'
# Drop for ceph
- action: drop
source_labels: [__meta_kubernetes_pod_container_name, __meta_kubernetes_pod_annotation_prometheus_io_port]
separator: ;
regex: (chown-container-data-dir|log-collector|mgr|watch-active);9283
# Drop for consul
- action: drop
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
regex: '20200|8500'
# Drop for istio
- action: replace
source_labels: [__meta_kubernetes_pod_annotationpresent_sidecar_istio_io_status]
target_label: __tmp_pod_has_istio
regex: 'true'
replacement: 'true'
- action: drop
source_labels: [__tmp_pod_has_istio, __meta_kubernetes_pod_container_name]
regex: 'true;(discovery)|(.{0}$)'
# Drop for opa
- action: drop
source_labels:
- __meta_kubernetes_pod_container_name
regex: 'manager.*'
# Drop for kafka
- action: drop
source_labels:
- __meta_kubernetes_pod_container_name
regex: 'kafka-jmx-exporter|kafka-exporter'
# Drop for keda
- action: drop
source_labels:
- __meta_kubernetes_pod_container_name
regex: 'keda-operator-metrics-apiserver'
# Drop for ntp
- action: drop
source_labels:
- __meta_kubernetes_pod_container_name
regex: 'ntp-exporter'
# Drop for openshift-state-metrics
- action: drop
source_labels: [__meta_kubernetes_pod_container_name]
regex: 'openshift-state-metrics'
# Drop for portworx
- action: drop
source_labels: [__meta_kubernetes_pod_container_name]
regex: 'portworx'
# Drop for rabbitmq
- action: drop
source_labels: [__meta_kubernetes_pod_container_name]
regex: '(rabbitmq|prepare-plugins-dir)'
- action: drop
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
regex: '15691'
# Drop for Sysdig Admission Controller
- action: drop
source_labels:
- __meta_kubernetes_pod_container_name
- __meta_kubernetes_pod_annotation_prometheus_io_port
regex: admission-controller;(8080|5000)
- action: replace
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
target_label: __scheme__
regex: (https?)
- action: replace
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
target_label: __metrics_path__
regex: (.+)
- action: replace
source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# Holding on to pod-id and container name so we can associate the metrics
# with the container (and cluster hierarchy)
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: sysdig_k8s_pod_uid
- action: replace
source_labels: [__meta_kubernetes_pod_container_name]
target_label: sysdig_k8s_pod_container_name
Understand the Prometheus Settings
Scrape Interval
The default scrape interval is 10 seconds. However, the value can be
overridden per scraping job. The scrape interval configured in the
prometheus.yaml
is independent of the agent configuration.
Promscrape V2 reads prometheus.yaml
and initiates scraping jobs.
The metrics from targets are collected per scrape interval for each target and immediately forwarded to the agent. The agent sends the metrics every 10 seconds to the Sysdig collector. Only those metrics that have been received since the last transmission are sent to the collector. If a scraping job for a job has a scrape interval longer than 10 seconds, the agent transmissions might not include all the metrics from that job.
Hostname Selection
__HOSTIPS__
is replaced by the host IP addresses. Selection by the
host IP address is preferred because of its reliability.
__HOSTNAME__
is replaced with the actual hostname before promscrape
starts scraping the targets. This allows promscrape
to ignore targets
running on other hosts.
Relabeling Configuration
The default Prometheus configuration file contains the following two relabeling configurations:
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: sysdig_k8s_pod_uid
- action: replace
source_labels: [__meta_kubernetes_pod_container_name]
target_label: sysdig_k8s_pod_container_name
These rules add two
labels, sysdig_k8s_pod_uid
and sysdig_k8s_pod_container_name
to
every metric gathered from the local targets, containing pod ID and
container name respectively. These labels will be dropped from the
metrics before sending them to the Sysdig collector for further
processing.
Configure Prometheus Configuration File Using the Agent Configmap
Here is an example for setting up the prometheus.yaml
file using the agent configmap:
apiVersion: v1
data:
dragent.yaml: |
new_k8s: true
k8s_cluster_name: your-cluster-name
metrics_excess_log: true
10s_flush_enable: true
app_checks_enabled: false
use_promscrape: true
promscrape_fastproto: true
prometheus:
enabled: true
prom_service_discovery: true
log_errors: true
max_metrics: 200000
max_metrics_per_process: 200000
max_tags_per_metric: 100
ingest_raw: true
ingest_calculated: false
snaplen: 512
tags: role:cluster
prometheus.yaml: |
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'haproxy-router'
basic_auth:
username: USER
password: PASSWORD
tls_config:
insecure_skip_verify: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Trying to ensure we only scrape local targets
# We need the wildcard at the end because in AWS the node name is the FQDN,
# whereas in Azure the node name is the base host name
- action: keep
source_labels: [__meta_kubernetes_pod_host_ip]
regex: __HOSTIPS__
- source_labels: [__meta_kubernetes_pod_phase]
action: keep
regex: Running
- action: keep
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_pod_name
separator: '/'
regex: 'default/router-1-.+'
# Holding on to pod-id and container name so we can associate the metrics
# with the container (and cluster hierarchy)
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: sysdig_k8s_pod_uid
- action: replace
source_labels: [__meta_kubernetes_pod_container_name]
target_label: sysdig_k8s_pod_container_name
kind: ConfigMap
metadata:
labels:
app: sysdig-agent
name: sysdig-agent
namespace: sysdig-agent
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.