Collect Prometheus Metrics
Sysdig supports collecting, storing, and querying Prometheus native metrics and labels. You can use Sysdig in the same way that you use
Prometheus and leverage Prometheus Query Language (PromQL) to create dashboards and alerts.
Sysdig is compatible with Prometheus HTTP API to query your monitoring data programmatically using PromQL and extend Sysdig to other platforms
like Grafana.
From a metric collection standpoint, a lightweight Prometheus server is directly embedded into the Sysdig agent to facilitate metric collection.
This also supports targets, instances, and jobs with filtering and relabeling using Prometheus syntax. You can configure the agent to
identify these processes that expose Prometheus metric endpoints on its own host and send it to the Sysdig collector for storing and further
processing.

The Prometheus product itself does not necessarily have to be installed for Prometheus metrics
collection.
Agent Compatibility
See the Sysdig agent versions and compatibility with Prometheus features:
Sysdig Agent v12.2.0 and Above
The following features are enabled by default:
- Automatically scraping any Kubernetes pods with the following annotation set:
prometheus.io/scrape=true
- Automatically scrape applications supported by Monitoring Integrations.
For more information, see Set up the Environment.
Sysdig Agent Prior to v12.0.0
Manually enable Prometheus in dragent.yaml
file:
prometheus:
enabled: true
For more information, see Enable Promscrape V2 on Older Versions of Sysdig Agent .
Learn More
The following topics describe in detail about setting up the environment for service discovery, metrics collection, and further processing.
See the following blog posts for additional context on the Prometheus metric and how such metrics are typically used.
1 - Set Up the Environment
If you are already leveraging Kubernetes Service Discovery, specifically the approach given in prometheus-kubernetes.yml, you might already have annotations attached to the pods that mark them as eligible for scraping. Such environments can quickly begin scraping the same metrics by using the Sysdig agent in a single step.
If you are not using Kubernetes Service Discovery, follow the instructions given below:
Annotation
Ensure that the Kubernetes pods that contain your Prometheus exporters have been deployed with the following annotations to enable scraping, substituting the listening exporter-TCP-port
:
spec:
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "exporter-TCP-port"
The configuration above assumes your exporters use the typical endpoint called /metrics
. If your exporter is using a different
endpoint, specify by adding the following additional annotation, substituting the exporter-endpoint-name
:
prometheus.io/path: "/exporter-endpoint-name"
Sample Exporter
Use the Sample Exporter to test your environment. You will quickly see auto-discovered Prometheus metrics being displayed on Sysdig Monitor. You can use this working example as a basis to similarly annotate your own exporters.
2 - Enable Prometheus Native Service Discovery
Prometheus service discovery is a standard method of finding endpoints to scrape for metrics. You configure prometheus.yaml
and custom jobs to prepare for scraping endpoints in the same way you do for native Prometheus.
For metric collection, a lightweight Prometheus server, named promscrape
, is directly embedded into the Sysdig agent to facilitate metric collection. Promscrape supports filtering and relabeling targets, instances, and jobs and identify them using the custom jobs configured in the prometheus.yaml
file. The latest versions of Sysdig agent (above v12.0.0) by default identify the processes that expose Prometheus metric endpoints on its own host and send it to the Sysdig collector for storing and further processing. On older versions of Sysdig agent, you enable these features by configuring dragent.yaml
.
Working with Promscrape
Promscrape is a lightweight Prometheus server that is embedded with the Sysdig agent. Promscrape scrapes metrics from Prometheus endpoints and sends them for storing and processing.
Promscrape has two versions: Promscrape V1 and Promscrape V2.
Promscrape V2
Promscrape itself discovers targets by using the standard Prometheus configuration (native Prometheus service discovery), allowing the use of relabel_configs
to find or modify targets. An instance of promscrape runs on every node that is running a Sysdig agent and is intended to collect metrics from local as well as remote targets specified in the prometheus.yaml
file. The prometheus.yaml
file you create is shared across all such nodes.
Promscrape V2 is enabled by default on Sysdig agent v12.5.0 and above. On older versions of Sysdig agent, you need to manually enable Promscrape V2, which allows for native Prometheus service discovery, by setting the prom_service_discovery
parameter to true
in dragent.yaml
.
Promscrape V1
Sysdig agent discovers scrape targets through the Sysdig process_filter
rules. For more information, see Process Filter.
About Promscrape V2
Supported Features
Promscrape V2 supports the following native Prometheus capabilities:
Relabeling: Promscrape V2 supports Prometheus native relabel_config and metric_relabel_configs. Relabel configuration enables the following:
Sample format: In addition to the regular sample format (metrics name, labels, and metrics reading), Promscrape V2 includes metrics type (counter, gauge, histogram, summary) to every sample sent to the agent.
Scraping configuration: Promscrape V2 supports all types of scraping configuration, such as federation, blackbox-exporter, and so on.
Label mapping: The metrics can be mapped to their source (pod, process) by using the source labels which in turn map certain Prometheus label names to the known agent tags.
Unsupported Features
Promscrape V2 does not support calculated metrics.
Promscrape V2 does not support cluster-wide features such as
recording rules and alert management.
Service discovery configurations in Promscrape V1 (process_filter
) and Promscrape V2 (prometheus.yaml
) are incompatible and non-translatable.
Promscrape V2 collects metrics from both local and remote targets specified in the prometheus.yaml
file and therefore it does not make sense to configure promscrape to scrape remote targets, because you will see metrics duplication in this case.
Promscrape V2 does not have the cluster view and therefore it ignores the configuration of recording rules and alerts, which is used in the cluster-wide metrics collection. Therefore, the following Prometheus Configurations are not supported
Sysdig uses __HOSTNAME__
, which is not a standard Prometheus
keyword.
Enable Promscrape V2 on Older Versions of Sysdig Agent
To enable Prometheus native service discovery on agent versions prior to
11.2:
Open dragent.yaml
file.
Set the following Prometheus Service Discovery parameter to true:
prometheus:
prom_service_discovery: true
If true, promscrape.v2
is used. Otherwise, promscrape.v1
is
used to scrape the targets.
Restart the agent.
Create Custom Jobs
Prerequisites
Ensure the following features are enabled:
- Monitoring Integration
- Promscrape V2
If you are using Sysdig agent v12.0.0 or above, these features are enabled by default.
Prepare Custom Job
You set up custom jobs in the Prometheus configuration file to identify endpoints that expose Prometheus metrics. Sysdig agent uses these custom jobs to scrape endpoints by using promscrape, the lightweight Prometheus server embedded in it.
Guidelines
Ensure that targets are scraped only by the agent running on the same node as the target. You do this by adding the host selection relabeling rules.
Use the sysdig specific relabeling rules to automatically get the right workload labels applied.
Example Prometheus Configuration file
The prometheus.yaml
file comes with a default configuration for
scraping the pods running on the local node. This configuration also
includes the rules to preserve pod UID and container name labels for
further correlation with Kubernetes State Metrics or Sysdig native
metrics.
Here is an example prometheus.yaml
file that you can use to set up custom jobs.
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'my_pod_job'
sample_limit: 40000
tls_config:
insecure_skip_verify: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Look for pod name starting with "my_pod_prefix" in namespace "my_namespace"
- action: keep
source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_name]
separator: /
regex: my_namespace/my_pod_prefix.+
# In those pods try to scrape from port 9876
- source_labels: [__address__]
action: replace
target_label: __address__
regex: (.+?)(\\:\\d)?
replacement: $1:9876
# Trying to ensure we only scrape local targets
# __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
# of all the active network interfaces on the host
- action: keep
source_labels: [__meta_kubernetes_pod_host_ip]
regex: __HOSTIPS__
# Holding on to pod-id and container name so we can associate the metrics
# with the container (and cluster hierarchy)
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: sysdig_k8s_pod_uid
- action: replace
source_labels: [__meta_kubernetes_pod_container_name]
target_label: sysdig_k8s_pod_container_name
Default Scrape Job
If Monitoring Integration is not enabled for you and you still want to automatically collect metrics from pods with the Prometheus annotations set (prometheus.io/scrape=true
), add the following default scrape job to your prometheus.yaml
file:
- job_name: 'k8s-pods'
sample_limit: 40000
tls_config:
insecure_skip_verify: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Trying to ensure we only scrape local targets
# __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
# of all the active network interfaces on the host
- action: keep
source_labels: [__meta_kubernetes_pod_host_ip]
regex: __HOSTIPS__
- action: keep
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
regex: true
- action: replace
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
target_label: __scheme__
regex: (https?)
- action: replace
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
target_label: __metrics_path__
regex: (.+)
- action: replace
source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# Holding on to pod-id and container name so we can associate the metrics
# with the container (and cluster hierarchy)
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: sysdig_k8s_pod_uid
- action: replace
source_labels: [__meta_kubernetes_pod_container_name]
target_label: sysdig_k8s_pod_container_name
Default Prometheus Configuration File
Here is the default prometheus.yaml
file.
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'k8s-pods'
tls_config:
insecure_skip_verify: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Trying to ensure we only scrape local targets
# __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
# of all the active network interfaces on the host
- action: keep
source_labels: [__meta_kubernetes_pod_host_ip]
regex: __HOSTIPS__
- action: keep
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
regex: true
- action: replace
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
target_label: __scheme__
regex: (https?)
- action: replace
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
target_label: __metrics_path__
regex: (.+)
- action: replace
source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# Holding on to pod-id and container name so we can associate the metrics
# with the container (and cluster hierarchy)
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: sysdig_k8s_pod_uid
- action: replace
source_labels: [__meta_kubernetes_pod_container_name]
target_label: sysdig_k8s_pod_container_name
Understand the Prometheus Settings
Scrape Interval
The default scrape interval is 10 seconds. However, the value can be
overridden per scraping job. The scrape interval configured in the
prometheus.yaml
is independent of the agent configuration.
Promscrape V2 reads prometheus.yaml
and initiates scraping jobs.
The metrics from targets are collected per scrape interval for each
target and immediately forwarded to the agent. The agent sends the
metrics every 10 seconds to the Sysdig collector. Only those metrics
that have been received since the last transmission are sent to the
collector. If a scraping job for a job has a scrape interval longer than
10 seconds, the agent transmissions might not include all the metrics
from that job.
Hostname Selection
__HOSTIPS__
is replaced by the host IP addresses. Selection by the
host IP address is preferred because of its reliability.
__HOSTNAME__
is replaced with the actual hostname before promscrape
starts scraping the targets. This allows promscrape
to ignore targets
running on other hosts.
Relabeling Configuration
The default Prometheus configuration file contains the following two
relabeling configurations:
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: sysdig_k8s_pod_uid
- action: replace
source_labels: [__meta_kubernetes_pod_container_name]
target_label: sysdig_k8s_pod_container_name
These rules add two
labels, sysdig_k8s_pod_uid
and sysdig_k8s_pod_container_name
to
every metric gathered from the local targets, containing pod ID and
container name respectively. These labels will be dropped from the
metrics before sending them to the Sysdig collector for further
processing.
Here is an example for setting up the prometheus.yaml
file using the agent configmap:
apiVersion: v1
data:
dragent.yaml: |
new_k8s: true
k8s_cluster_name: your-cluster-name
metrics_excess_log: true
10s_flush_enable: true
app_checks_enabled: false
use_promscrape: true
promscrape_fastproto: true
prometheus:
enabled: true
prom_service_discovery: true
log_errors: true
max_metrics: 200000
max_metrics_per_process: 200000
max_tags_per_metric: 100
ingest_raw: true
ingest_calculated: false
snaplen: 512
tags: role:cluster
prometheus.yaml: |
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'haproxy-router'
basic_auth:
username: USER
password: PASSWORD
tls_config:
insecure_skip_verify: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Trying to ensure we only scrape local targets
# We need the wildcard at the end because in AWS the node name is the FQDN,
# whereas in Azure the node name is the base host name
- action: keep
source_labels: [__meta_kubernetes_pod_host_ip]
regex: __HOSTIPS__
- action: keep
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_pod_name
separator: '/'
regex: 'default/router-1-.+'
# Holding on to pod-id and container name so we can associate the metrics
# with the container (and cluster hierarchy)
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: sysdig_k8s_pod_uid
- action: replace
source_labels: [__meta_kubernetes_pod_container_name]
target_label: sysdig_k8s_pod_container_name
kind: ConfigMap
metadata:
labels:
app: sysdig-agent
name: sysdig-agent
namespace: sysdig-agent
3 - Migrating from Promscrape V1 to V2
Promscrape is the lightweight Prometheus server in the Sysdig agent. An
updated version of promscrape
, named Promscrape V2 is available. This configuration is controlled by the prom_discovery_service
parameter in the dragent.yaml
file. To use the latest features, such as Service Discovery and Monitoring Integrations, you need to have this option enabled in your environment.
Compare Promscrape V1 and V2
The main difference between V1 and V2 is how scrape targets are
determined.
In v1 targets are found through process-filtering
rules configured in
dragent.yaml
or dragent.default.yaml
(if no rules are given in
dragent.yaml
).The process-filtering
rules are applied to all the
running processes on the host. Matches are made based on process
attributes, such as process name or TCP ports being listened to, as well
as associated contexts from docker or Kubernetes, such as container
labels or Kubernetes annotations.
With Promscrape V2, scrape targets are determined by scrape_configs
fields in a prometheus.yaml
file (or the prometheus-v2.default.yaml
file if no prometheus.yaml
exists). Because promscrape
is adapted
from the open-source Prometheus server, the scrape_config
settings are
compatible with the normal Prometheus configuration. Here is an example:
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'my_pod_job'
sample_limit: 40000
tls_config:
insecure_skip_verify: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Look for pod name starting with "my_pod_prefix" in namespace "my_namespace"
- action:
source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_name,__meta_kubernetes_pod_label]
separator: /
regex: my_namespace/my_pod_prefix.+
- action: keep
source_labels: [__meta_kubernetes_pod_label_app]
regex: my_app_metrics
# In those pods try to scrape from port 9876
- source_labels: [__address__]
action: replace
target_label: __address__
regex: (.+?)(\\:\\d)?
replacement: $1:9876
# Trying to ensure we only scrape local targets
# __HOSTIPS__ is replaced by promscrape with a regex list of the IP addresses
# of all the active network interfaces on the host
- action: keep
source_labels: [__meta_kubernetes_pod_host_ip]
regex: __HOSTIPS__
# Holding on to pod-id and container name so we can associate the metrics
# with the container (and cluster hierarchy)
- action: replace
source_labels: [__meta_kubernetes_pod_uid]
target_label: sysdig_k8s_pod_uid
- action: replace
source_labels: [__meta_kubernetes_pod_container_name]
target_label: sysdig_k8s_pod_container_name
Migrate Using Default Configuration
The default configuration for Promscrape v1 triggers the scraping based
on standard Kubernetes pod annotations and container labels. The default
configuration for v2 currently triggers scraping only based on the
standard Kubernetes pod annotations leveraging the Prometheus native
service discovery.
Example Pod Annotations
spec:
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: ""
| true
| Required field. |
prometheus.io/port: ""
| The port number to scrape | Optional. It will scrape all pod-registered ports if omitted. |
prometheus.io/scheme
| <http|https > | The default is http . |
(required field)prometheus.io/path | The URL | The default is /metrics . |
Example Static Job
- job_name: 'static10'
static_configs:
- targets: ['localhost:5010']
Guidelines
Users running Kubernetes with Promscrape v1 default rules and
triggering scraping based on pod annotations need not take any
action to migrate to v2. The migration happens automatically.
Users operating non-Kubernetes environments might need to continue
using v1 for now, depending on how scraping is triggered. As of
today promscrape.v2
doesn’t support leveraging container and
Docker labels to discover Prometheus metrics endpoints. If your
environment is one of these, define static jobs with the IP:port
to be scrapped.
Migrate Using Custom Rules
If you relying on custom process_filter
rules to collect metrics, use
any method using standard Prometheus configuration syntax to scrape the
endpoints. We recommend one of the following:
- Adopt the standard approach of adding the standard Prometheus
annotations to their pods. For more information, see Migrate Using Default Configuration.
- Write a Prometheus
scrape_config
by using Kubernetes pods service
discovery and use the appropriate pod metadata to trigger their
scrapes.
See the below example for converting your process_filter
rules to
Prometheus terminology.
- include:
kubernetes.pod.annotation.sysdig.com/test: true
| - action: keep
source_labels: [__meta_kubernetes_pod_annotation_sysdig_com_test]
regex: true
|
- include:
kubernetes.pod.label.app: sysdig
| - action: keep
source_labels: [__meta_kubernetes_pod_label_app]
regex: 'sysdig'
|
-include:
container.label.com.sysdig.test: true
| Not supported. |
- include:
process.name: test
| Not supported. |
- include:
process.cmdline: sysdig-agent
| Not supported. |
- include:
port: 8080
| - action: keep
source_labels: [__meta_kubernetes_pod_container_port_number]
regex: '8080'
|
- include:
container.image: sysdig-agent
| Not supported. |
- include:
container.name: sysdig-agent
| - action: keep
source_labels: [__meta_kubernetes_pod_container_name]
regex: 'sysdig-agent'
|
- include:
appcheck.match: sysdig
| Appchecks are not compatble with Promscrape v2. See Configure Monitoring Integrations for supported integrations. |
If you have any queries related to promscrape migration, contact Sysdig Support.
4 - Dynamic Sampling
Dynamic sampling supports scraping a rotating set of Prometheus endpoints based on the total amount of time series scraped from each endpoint. With dynamic sampling turned on, you get consistent and up-to-date data from every Prometheus endpoints on a given node at dynamic intervals, while maintaining the data collection frequency and fidelity of Prometheus metrics via the Sysdig agent.
Sysdig’s ability to collect and process the volumes of data scraped from different Prometheus endpoints is controlled by a mix of the amount of time serires scraped from each endpoint, the total amount of Prometheus time series collected by the agent in each time window, and the frequency at which Sysdig agent collects and sends the data to the Sysdig backend. On nodes with multiple Prometheus endpoints sending high volumes of time series, the Sysdig agent may need to skip some endpoints depending on the overall volume and scrape frequency. Dynamic sampling addresses this by cycling through individual Prometheus endpoints and scraping the latest time series from each endpoint on a rotational basis such that all time series from all Prometheus endpoints are processed at dynamic intervals. The result is more timeseries being scraped and processed overall at a lower frequency. For example, instead of receiving 50,000 timeseries every 10 seconds, you might receive 100,000 timeseries every 20 seconds.
Dynamic Sampling Considerations
Any alerts that depend on dynamically-sampled metrics will have the same interval as the metric. Using the example here, the alerts related to either endpoint will be raised, at most, every 20 seconds.
The time series from all the endpoints are sent to the backend in manner to prevent data integrity issues. Therefore, if total timeseries from a particular endpoint is greater than the maximum allowed limit by any agent, the time series from that endpoint may be dropped irrespective of whether dynamic sampling is turned on or not.
Sysdig agent always maximizes the allowed limit for every interval as long as all the time series of an endpoint can be fit into that allowed limit.
Enable Dynamic Sampling
To configure dynamic sampling:
Open the dragent.yaml
file.
Add the following line:
promscrape_emit_all: true
Restart the agent.